The Bicentennial Census 



NEW DIRECTIONS FOR 
METHODOIOGV IN 1990 



Constance F. Citro and 
Michael L. Cohen, Editors 



Panel on Decennial Census Methodology 
Committee on National Statistics 

Commission on Behavioral and 
Social Sciences and Education 

National Research Council 



NATIONAL ACADEMY PRESS 
Washington, D.C. 1985 



National Academy Press 2101 Constitution Avenue, N.W. Washington, DC 21 18 



NOTICE: The project that is the subject of this report was approved by the Coven ng 

Board of the National Research Council, whose members are drawn from the counci of 

the National Academy of Sciences, the National Academy of Engineering, and he 

Institute of Medicine. The members of the commmittee responsible for the report \ ?re 
chosen for their special competences and with regard for appropriate balance. 

This report has been approved by a group other than the authors accordin to 

procedures approved by a Report Review Committee consisting of members of he 

National Academy of Sciences, the National Academy of Engineering, and the Inst: ite 
of Medicine. 

The National Research Council was established by the National Academy of Scienc in 

1916 to associate the broad community of science and technology with the Acadei y's 

purposes of furthering knowledge and of advising the federal government. The Coi icil 

operates in accordance with general policies determined by the Academy undei :he 

authority of its congressional charter of 1863, which establishes the Academy 5 a 

private, nonprofit, self-governing membership corporation. The Council has becom :he 

principal operating agency of both the National Academy of Sciences and the Nati nal 

Academy of Engineering in the conduct of their services to the government, the pL tic, 

and the scientific and engineering communities. It is administered jointly by )th 

Academies and the Institute of Medicine . The National Academy of Engineering an< :he 

Institute of Medicine were established in 1964 and 1970, respectively, under the ch ter 
of the National Academy of Sciences. 

This project was supported by funds from the Bureau of the Census, U.S. Departme t of 
Commerce. 



Library of Congress Catalog Card Number 85-51816 
International Standard Book Number 0-309-03626-7 



Printed in the United States of America 

ii 



Panel on Decennial Census Methodology 



JOHN W. PRATT (Chair), Graduate School of Business, 

Harvard University 
PASTORA SAN JUAN CAFFERTY, School of Social Service 

Administration, University of Chicago 
ANSLEY J. CO ALE, Office of Population Research, Princeton 

University 
DONALD DESKINS, Department of Sociology, University of 

Michigan 
IVAN P. FELLEGI, Deputy Chief Statistician, Statistics Canada, 

Ottawa 
WAYNE A. FULLER, Department of Statistics, Iowa State 

University 
JOSEPH B. KADANE, Department of Statistics, Carnegie-Mellon 

University 
BENJAMIN KING, Research Statistics Group, Educational 

Testing Service, Princeton, N.J. 
ALBERT MADANSKY, Graduate School of Business, University 

of Chicago 
ALBERTO PALLONI, Center for Demography and Ecology, 

University of Wisconsin 

JOHN ROLPH, Rand Corporation, Santa Monica, Calif. 
COURTENAY M. SLATER, CEC Associates, Washington, D.C. 
JOSEPH WAKSBERG, Westat, Inc., Rockville, Md. 

CONSTANCE F. CITRO, Study Director 
MICHAEL L. COHEN, Research Associate 



UNIVERSITY LIBRARIES 

EUON UNIVERSITY 
fi!TS8URGH 4 PENNSYLVANIA 15213 
iii 



Committee on National Statistics 



LINCOLN E. MOSES (Chair), Department of Statistics, Stanf< :c 

University 
LEO BREIMAN, Department of Statistics, University of 

California, Berkeley 
JOEL E. COHEN, Laboratory of Populations, The Rockefeller 

University 
WAYNE A. FULLER, Department of Statistics, Iowa State 

University 
SEYMOUR GEISSER, School of Statistics, University of 

Minnesota 
F. THOMAS JUSTER, Institute for Social Research, University : 

Michigan 
JANE A. MENKEN, Office of Population Research, Princeton 

University 
JOHN W. PRATT, Graduate School of Business, Harvard 

University 
S. JAMES PRESS, Department of Statistics, University of 

California, Riverside 
CHRISTOPHER A. SIMS, Department of Economics, Universil 

of Minnesota 
BURTON H. SINGER, Department of Statistics, Columbia 

University 

COURTENAY M. SLATER, CEC Associates, Washington, D.C 
JUDITH M. TANUR, Department of Sociology, State Universit; 

of New York, Stony Brook 
DAVID L. WALLACE, Department of Statistics, University of 

Chicago 

EDWIN D. GOLDFIELD, Executive Director 
MIRON L. STRAP, Research Director 
MICHELE W. ZINN, Administrative Associate 



IV 



Contents 



Preface xii 

Acknowledgments xv 

1 Introduction 1 

The Planning Cycle for 1990, 4 

The Importance of Choice of Methodology for 1990, 5 

Proposed Changes in Methodology, 10 

Independent Reviews of Decennial Census Plans, 13 

Major Themes of the Report, 17 

Overview of the Report and Recommendations, 19 

2 Purposes and Uses of the Decennial Census 37 

Historical Uses of Census Data, 39 
Distinguishing Features of the Modern Census and 

Its Uses, 40 

Effects of Census Error on Key Uses: Review of Research, 51 
Effects of Error in Postcensal Estimates, 70 
Appendix 2.1 - State and Local Government Uses of Census 

Data, 75 

State Uses of Census Data, 75 

Local Agency Uses of Census Data, 79 
Appendix 2.2 - Census Data Use in New Jersey A Case 

Study, 81 

Distribution Channels, 81 

Profile of State Data Center Users and Uses, 83 

Government Uses of Census Data in New Jersey, 86 

3 Census Methodology: Prior Practice and Current Test Plans 87 

1980 Census Methodology, 88 
Methodology Used in Previous Censuses, 98 
Methodology Used in Other Western Countries, 99 
Census Bureau Research Plans for 1990, 104 
Assessment and General Recommendations, 109 
Appendix 3.1 - An Overview of Sequential Hot-Deck 

Imputation, 115 
Appendix 3.2 - A Description of Iterative Proportional 

Fitting, 118 



4 Evaluating the Decennial Census: Past Experience 120 

Methods of Coverage Evaluation, 122 
Coverage Evaluation Prior to 1980: Micro-Level 

Methods, 124 
Coverage Evaluation Prior to 1980: Macro-Level 

Methods, 133 

The 1980 Post-Enumeration Program, 139 
1980 Demographic Analysis, 148 
Recent Use of Administrative Lists for Coverage 

Evaluation, 151 
Considerations for Assessing Alternative Coverage 

Evaluation Methods, 157 

Error Profiles for Coverage Evaluation Methods, 158 
Appendix 4.1 - An Introduction to Estimation from Multiple 

Lists, 166 

Dual-System Estimation, 167 

Other Estimation Approaches, 168 
Appendix 4.2 - Operational Aspects and Modeling of 

Computer Matching, 171 

Operational Difficulties of Matching, 171 

General Algorithm of Matching, 172 

Blocking, 172 

Variable Selection, 173 

A Mathematical Model for Record Linkage, 174 

5 Taking the Census I: Improving the Count 7 6 

Hard-to-Count Groups in the Census: What Is Known, 177 
Coverage Improvement Programs: Past Experience, 187 
Census Bureau Plans for Testing Coverage Improvement 

Programs for 1990, 202 

Needed Research on Undercount and Overcount, 204 
Issues in Coverage Improvement: Questionnaire 

Content, 205 
Issues in Coverage Improvement: Special Enumeration 

Procedures, 220 
Appendix 5.1- Gross Omissions and Gross 

Overenumerations in the Census, 224 

Gross Omissions of People, 224 

Gross Overenumerations of People, 224 

Housing Coverage Studies, 240 

6 Taking the Census II: The Uses of Sampling and 
Administrative Records 2* 

Sampling in the Census, 243 
Sampling for the Count, 246 
Sampling for Content, 257 



The Use of Administrative Records and Sampling for 

Improved Accuracy of Content, 262 
Appendix 6.1 - Cost Estimates for a Sample Census, 268 
Appendix 6.2 - Illustrative Follow-up Scenario Using 

Sampling, 270 
Appendix 6.3 - Improving Data on Housing Structure Items: 

A Suggested Method, 273 

7 Adjustment of Population Counts 275 

The Need for Adjustment, 275 

Evaluating Adjustment: Loss Functions and Yardsticks, 278 

Considerations of Internal Consistency, 292 

Considerations of Timing, 296 

Procedures for Adjustment, 298 

Census Bureau Plans for Research and Testing on 

Adjustment, 307 

Priorities for Research and Testing on Adjustment, 310 
Appendix 7. 1 - A Quick Look at Loss Functions and 

Apportionment, 313 
Appendix 7.2 - An Introduction to Hierarchical Bayesian 

Techniques, 315 
Appendix 7.3 - Aggregation of Synthetic Estimates: A 

Counterintuitive Example, 318 

8 Measuring the Completeness of the 1990 Census 319 

Recapitulation of Major Issues in Coverage Evaluation, 319 
Appraisal of Census Bureau Pretest Plans for Coverage 

Evaluation, 321 
The 1990 Demographic Analysis Program: Possible 

Improvements, 330 
The Reverse Record Check Program: Considerations for 

1990, 333 
The 1990 Post-Enumeration Program: Possible 

Improvements, 335 
Systematic Observer Methodology: Considerations 

for 1990, 349 
Appendix 8.1 - The Population of Illegal Aliens: Methods 

and Estimates, 355 
Appendix 8.2 - Estimates of Variance and Cost for a Large 

Systematic Observer Study, 360 

References 361 

Biographical Sketches of Panel Members and Staff 385 

Index 389 



Tables 



Table 2.1 Uses of Census Data in Selected Federal 

Grant-in- Aid Programs, 60 
Table 2.2 Percentage of Error in County Population Est aates 

for 1980 by Metropolitan-Nonmetropolitan 

Residence, 71 
Table 2.3 Selected Measures of Accuracy of County 

Population Estimates for 1980, by Size of Cou ty, 

72 
Table 2.4 Percentage of Error in Subcounty Population 

Estimates for 1980, 73 
Table 2.5 Selected Measures of Accuracy of Subcounty 

Population Estimates for 1980, by Size of Are; 74 
Table 2.6 Data Requests Received by the New Jersey St. :e 

Data Center's Lead Agency by Type of User a .d 

Data Source, 1983, 83 

Table 3.1 Notation for Iterative Proportional Fitting for ; 
Small Two-Way Table, 119 

Table 4.1 Response and Match Resolution Rates for Api 

1980 Current Population Survey (P Sample) ty 

Race, 146 

Table 4.2 Scheme to Identify Various 1980 PEP Estimate , 147 
Table 4.3 1980 PEP Estimates of Percentage Undercover; ;e 

for Demographic Groups at the National Leve 148 
Table 4.4 Two-Way Table Underlying Simple Dual-Syste n. 

Estimation, 167 
Table 4.5 Multiple List Quantities for the Case of Two L ;ts 

and a Census: Three-Way Contingency Table, 69 

Table 5.1 Net Undercount Rates by Race and Sex From 
Demographic Analysis, 1950 to 1980 Decennial 
Censuses (estimated population minus census 
population as a percentage of estimated 
population), 178 

Table 5.2 Additions and Costs of 1970 Census Coverage 
Improvement Programs, 190 

Table 5.3 Additions and Costs of 1980 Census Coverage 
Improvement Programs, 196 



Table 5.4 Household Relationship by Relative Gross Omission 
Rate for a Sample of Persons, Post-Enumeration 
Program-Census Match (1980, PEP Series 3-8), 226 

Table 5.5 Ethnicity and Mail Nonreturn Rate by Relative 
Gross Omission Rate for a Sample of Persons, 
Post-Enumeration Program-Census Match (1980, 
PEP Series 3-8), 227 

Table 5.6 Type of Place and Mail Nonreturn Rate by Relative 
Gross Omission Rate for a Sample of Persons, 
Post-Enumeration Program-Census Match (1980, 
PEP Series 3-8), 228 

Table 5.7 Ethnicity, Type of Return, and Income by Relative 
Gross Omission Rate for a Sample of Income Tax 
Filers Ages 18 to 64, IRS-Census Match (1980), 231 

Table 5.8 Household Relationship by Relative Gross Omission 
Rate for a Sample of Persons, CPS-Census Match 
(1960), 232 

Table 5.9 Employment Status by Sex and Income by Race by 
Relative Gross Omission Rate for a Sample of 
Persons, CPS-Census Match (1960), 233 

Table 5.10 Household Relationship and Education Level by 
Relative Gross Omission Rate for a Sample of 
Persons, Post-Enumeration Survey-Census Match 
(1950), 234 

Table 5.11 Occupation by Sex and Income by Relative Gross 
Omission Rate for a Sample of Persons, 
Post-Enumeration Survey-Census Match (1950), 235 

Table 5.12 Ethnicity and Household Relationship by Relative 
Gross Overenumeration Rate for a Sample of 
Persons, Post-Enumeration Program-Census Match 
(1980, PEP Series 3-8), 239 

Table 5.13 Percentage of Gross Omissions by Enumeration 
Status of the Building and Type of Area, for 
Samples of Occupied Units, CPS-Census Match 
(1970 and 1960), 242 

Table 6.1 Coefficient of Variation for Estimates of the Black 

Population by Size of Place and Size of Sample, 268 

Table 6.2 Hypothetical Distribution of Follow-up Callbacks 
and Costs, Scenario 1: Up to Four Callbacks 
Permitted in First Stage of Follow-up, 271 



IX 



Table 6.3 Hypothetical Distribution of Follow-up Call acks 

and Costs, Scenario 2: Two Callbacks Perm ted ir 

First Stage of Follow-up, Remaining Cases ample 
at 25% in Second Stage of Follow-up, 272 

Table 7.1 Simple Example of Synthetic Estimation, 30 
Table 7.2 An Example: Problems in Aggregating Synt etic 
Estimates, 318 

Table 8.1 Types and Magnitudes of Estimates of Illeg 
Migrants, 356 



Figures 

5.1 Percentage Net Undercount Rates by Age, Race, 
and Sex: 1980 Census (legally resident population, 
determined from demographic analysis), 179 

5.2 Percentage Net Undercount Rates by Age for Black 
Men: 1960-1980 Censuses (determined from 
demographic analysis), 185 

5.3 Race and Hispanic Origin Questions on the 1980 
Census Short Form, 208 

5.4 Race and Hispanic Origin Information That May Be 
Required in 1990, 210 

5.5 Coverage Questions in the 1980 Census, 215 

Census Costs Estimated for Varying Sampling 
Rates, 269 



XI 



Preface 



In 1982 the American Statistical Association Technical 1 m< 
the Census Undercount recommended "that the Burea oi 
Census sponsor an outside technical advisory group :>n 
dercount estimation and related problems" (American S iti< 
Association, 1984:256). Partly on the basis of that recomme] ia 
the Census Bureau requested the Committee on National ati 
to establish a panel: (1) to suggest research and experimen ;, ( 
recommend improved methods, and (3) to guide the Ie: 
Bureau on technical problems in appraising contending r et] 
with regard to the conduct of the decennial census. 

In response to that request, the Panel on Decennial le 
Methodology was established and charged with investigate 5 1 
major issues from a technical viewpoint, setting asic : 
considerations: 

(1) Adjustment of census counts and characteristics, in u 
exploration of formal criteria to evaluate measures of und *c 
and alternative adjustment procedures. 

(2) Uses of sampling in the decennial census, spec fi 
investigation of whether sampling for coverage improvem* it 
of nonrespondents for follow-up can improve accuracy at g 
cost. 

(3) Usesof administrative records, including investigatioi c 
possible utility of various types of records for improvi g 
accuracy of census counts and the efficiency of census oper ti 

At our first meeting, in January 1984, we took a broad viev c 
charge and identified additional topic areas beyond those lis 2 
possible investigation. We decided that in order to reach s< i 
conclusions regarding a choice of methodology for the de( : 
census it was critical to examine uses of census data and the E 
of accuracy needed to satisfy each use. We also decided thai 1 
essential to conduct a thorough review of procedures for imp : 
census coverage and of methods of evaluating the complete] ; 
coverage achieved in the census. Well-designed and well-ex< 
coverage improvement programs can importantly reduce er: 
the census. Well-designed and well-executed coverage eval 
programs inform users about the quality of the census resul 



are the source of input data necessary for any type of adjustment of 

the census counts to reduce errors further. 

The panel produced an interim report that focused on 

recommendations for improvements in census methodology that 

warranted early investigation and testing. That report, Planning the 
lei on J990 Census: Priorities for Research and Testing (National Research 
)f the Council, 1984), addressed three topic areas that were central to the 
i un- original charge: (1) uses of sampling for the census count, (2) 
istical methodologies for evaluating completeness of coverage of the 
ition, census, and (3) issues related to the adjustment or modification of 
istics census counts and characteristics. In addition, the report reviewed 
(2) to the Census Bureau's plans for the 1985 pretest of a two-stage 
TISUS methodology for conducting the census, 
hods This report updates and expands our ideas and conclusions 

about decennial census methodology. In it we endeavor to assess j 
'nsus the merits of investigating proposed changes in the decennial 
hree census that represent important departures from past practice and, ; 
legal specifically, to recommend concepts and procedures that we 

believe the Census Bureau should place high on its list of priority 
dine objectives for research and testing directed toward the nation's 
ount bicentennial census in 1990. 

Our report, following an introduction and overview that 

presents in collected form the recommendations of the panel, 

includes three background chapters, on purposes and uses of the 
ven decennial census, on methodology of prior censuses and current 

1990 census testing plans, and on past experience with coverage 

fthe evaluation. The report offers general and specific planning 

t k e recommendations in five areas: (1) overall strategy for planning the 

Dns 1990 census, (2) procedures for coverage improvement as part of 

the census, (3) uses of sampling and administrative records in 
:the taking the census, (4) adjustment of census counts and char- 

I for acteristics, and (5) measuring the completeness of the 1990 census, 

ible With regard to the key issue of adjustment of census counts, we 

nial argue for balance between efforts to achieve a complete enumer- 

? ree ation and efforts to improve the accuracy of census figures through 

was adjustment procedures. We believe that adjustment cannot be 

r i R g viewed as an alternative to obtaining as complete a count as 

s f possible through cost-effective means. However, the evidence is 

.ted overwhelming that no counting process will in fact enumerate 

s in everyone. Given our belief that the ultimate goal of the census 

ion should be the accuracy of the figures, we recommend that the 

Census Bureau pursue a vigorous program of research on coverage 



Xlll 



evaluation and adjustment methods that, if successfi , w 
permit adjustment of the 1990 census counts. 

With regard to issues related to adjustment and otl >r tc 
considered by the panel, the emphasis of our report > on 
extensive research and testing needed during the next fe^ yea 
support sound decisions regarding the choice of a artk 
methodology for the 1990 and subsequent censuses. ( .ven 
limited number of testing opportunities available comp< *ed 
the range of ideas that appear attractive to try out, we b ieve 
imperative for the Census Bureau to choose among -ese 
objectives. We have therefore endeavored to provide tim< y ac 
regarding what we believe are the most promising a\ nue 
pursue. 

John W. Pratt, Cha 

Panel on Dece nia 

Census Met ode 



xiv 



Acknowledgments 



The Panel on Decennial Census Methodology wishes to thank 
the many people who contributed to the preparation of this report. 

The staff of the Bureau of the Census has been extremely helpful, 
and we would like to thank particularly certain individuals for their 
assistance. Peter Bounpane, assistant director for demographic 
censuses, and Barbara Bailar, associate director for statistical 
standards and methodology, have been very generous of their time 
in providing assistance to the panel. Other Census Bureau staff 
who contributed valuable information include: Frederick Bohme, 
Charles Cowan, Gregg Diffendal, Richard Engels, Robert Fay, III, 
Penelope Harvison, Roger Herriot, Catherine Hines, Howard 
Hogan, Matt Jaro, Bruce Johnson, Charles Jones, Eli Marks, 
Nampeo McKenney, Susan Miskura, Jeffrey Passel, Paula 
Schneider, John Thompson, David Whitford, Kirk Wolter, and 
Arthur Young. 

A number of members of the broader statistical community gave 
very helpful presentations at meetings of the panel: Eugene 
Ericksen of Temple University and Mathematica Policy Research, 
Monroe Sirken of the National Center for Health Statistics, John 
Tukey of Princeton University and AT&T Bell Laboratories, and 
Kenneth Hill and Robert Warren of the Committee on National 
Statistics. We would also like to thank Harold Nisselson of Westat, 
Inc., who prepared background material on coverage evaluation 
programs, Margaret Boone, anthropologist, who made a helpful 
presentation on ethnographic research, and Judith Rowe of the 
Princeton-Rutgers Census Data Project for very useful comments 
on Chapter 2 of the report. 

The members and staff of the Committee on National Statistics 
were valuable resources to the panel in the course of our work. 
Stephen Fienberg, as chair of the committee, gave us the benefit of 
his enthusiastic support, direction, and encouragement in the early 
stages of the project. Committee member David Wallace read the 
draft report with great care and made detailed, incisive comments 
that improved the report in important ways. Staff members Edwin 
Goldfield, Thomas Jabine, Daniel Levine, Margaret Martin, and 
Miron Straf provided useful guidance and assistance. Kenneth Hill 
made very helpful comments on Chapters 4 and 8 of the report. 



xv 



Commission on ^^SSf^ ^h^' edfo - for the 
her fine technical editorial work wh iences and Edu ' Man, for 

Sciences and Ednrat.' .^j 1 !!" 11 ! 81011 on Behavioral a: d RnH a r 



ces and Education a nSTr 1On n Beh al a: d Social 
who reviewed the repor ^ f &? ttee t on Nati nal 5tatis S 

I would like to express snpr, i !u , gent com rnents. 
Constance Citro, ^sSL^ctor^ *J *" pand ' s * taff. 
attended to myriad admini^K ' ayed on to P of eve 
to and people ^^3 TnT"' ^ ^ 
meenngs.Herabmtyatthes^?Hntf effl ient ' P r 
and clearly about anySS^i?^? T' ^^ a 

amazing. With 0u t S acrificSglSir ryhm ? W3S Con nua y 
thought possible. The buSS e tl m SCh " duIes no e else 
and often was-yet her effort 7 be passed to her a: y time 

ruffled. She was 7 aided nJblvT ""^^ her temp, ^ 
research associate. He b^dS^f^"^ 7 Mlchad : hen - 

P^w^tou^beari^ftt^S.^^^ fthe 
ws inevitably ensuii|.^*J^^ 8tQra ^ *e ^ 






ss? r 

JohnW. >ratt 



xvi 



1 

Introduction 



Periodic censuses of population are a long-established 
tradition in the United States of America , with roots 
going back to the earliest years of the colonial period. 
The royal colony of Virginia conducted the first census 
in North America in the early seventeenth century, and 
censuses of individual colonies were frequently attempted 
during the colonial era (Bureau of the Census, 1970b:3). 

Political necessity led to the requirement for a 
periodic complete enumeration of the population in the 
new nation formed after the American Revolution. In the 
compromise between large and small states made at the 
1787 Constitutional Convention, the delegates voted to 
provide equal representation for each state in the Senate 
and representation proportional to population in the 
House of Representatives? the population of each state 
i^as to be determined through a decennial census. Article 
1. , section 2, of the Constitution stipulates: 

Representatives and direct Taxes shall be appor- 
tioned among the several States which may be 
included within this Union, according to their 
respective Numbers. . . The actual Enumeration 
shall be made within three Years after the first 
Meeting of the Congress of the United States, and 
within every subsequent Term of ten Years, in such 
Manner as they shall by Law direct. 

Although fundamental issues of the structure of 
overnment provided the motivation for the U.S. decennial 
ensus, the country's leaders recognized from the 
eginning that the census could be a valuable source of 
information for many other purposes. James Madison noted 
n 1789 (Bureau of the Census, 1970b:4) that Congress: 



had now an opportunity of obtaining the most 
useful information for those who should hereafter 
be called upon to legislate for their country, if 
this bill was extended to embrace some other 
objects besides the bare enumeration of the 
inhabitants i it would enable them to adapt the 
public measures to the particular circumstances of 
the community. 

The first census in 1790 asked the age, sex, and race 
of each resident. During the next 100 years, the census 
became firmly established as an important information 
resource* The centennial census in 1890 asked questions 
on more subjects than any census before or since, includ 
ing 30 items on the basic population questionnaire, sev- 
eral housing inquiries, and special inquiries about 
decedents, inmates of almshouses and prisons, Indians on 
and off reservations, Civil War veterans and widows of 
veterans, and several categories of mentally and physi- 
cally disabled people (Bureau of the Census, 1973b: 74-91 . 

Work is now under way to plan for the nation's bicen- 
tennial census of population and housing, scheduled to 
take place on April 1, 1990. Reflecting a long-standing 
tradition of improvement and modification to meet changii j 
information needs and to take advantage of technological 
advances, census-taking in the twentieth century has conu 
to differ in many important respects from census-taking 
in the nineteenth century. The 1990 census will undoubt- 
edly incorporate the following features that have been 
introduced into modern U.S. censuses: 

As has been true since 1910, the 1990 census will 
be directed by a permanent organization, the U.S. Bureau 
of the Census, with an experienced, professional staff in 
charge of planning and supervising the operation. 

As in every census since 1940, statistical sam- 
pling methods will be used to obtain responses to many 
census items, so that a large volume of useful informatio 
can be gathered without placing the burden on every house 
hold of responding to all questions (the 1980 census aske< 
7 population and 12 housing items of all households, 
while about 20 percent of households were asked an addi- 
tional 26 population and 20 housing questions) . 

As in the 1970 and 1980 censuses, the U.S. Postal 
Service will deliver most of the census questionnaires, 
and households will be asked to mail their completed 
questionnaires to census offices. Enumerators will 



telephone or visit only those households that do not 
completely respond (95 percent of households were sent 
questionnaires by mail in 1980 and 83 percent of them 
returned their questionnaires by mail) . 

* As has been true since 196 0, large computers will 
be used to process the census returns in a relatively 
short span of time? in contrast, the 1890 census required 
almost a full decade to process , even with the introduc- 
tion of punchcard machines to help the clerical work 
force. 

As in every census since 1950, intensive effort 
will be devoted to evaluating the completeness of cover- 
age of the total population and of important subgroups 
and geographic areas. 

The 1990 census will undoubtedly differ as well from 
the most recent censuses in the United States. Most of 
the differences are likely to represent incremental 
improvements and modifications to tried and tested 
procedures: for example, mailout-mailback techniques may 
be extended to the remaining 5 percent of the population 
residing in sparsely settled rural areas that enumerators 
personally canvassed in 1980. But pressures are growing 
in this country, as in other Western nations, to address 
the problems of rising costs of traditional census prac- 
tices on one hand and to satisfy expressed needs for 
greater accuracy in the numbers on the other. Conse- 
quently, exploration of changes in methods and techniques | 

that mark a greater break with tradition is under way: ; 

for example, one proposal that has received much attention U* 

is the use of statistical techniques to adjust the field ? 

counts for deficiencies in the enumeration. pt 

In the past, major changes in census methodology, such 
as the use of sampling for content and mailout-mailback 
enumeration, were often made on a small scale in one 
census and then more fully implemented in the next. The 
1990 census will be part of a continuing evolution that 
may lead to a methodology in the twenty-first century 
that differs as significantly from current methodology as 
current methodology differs from that of the nineteenth 
century. This report is an attempt to assess the merits 
of proposed changes in the next decennial census that 
represent important departures from past practice and, 
specifically, to recommend concepts and procedures that 
should be assigned high priority in the Census Bureau's 
research and testing program for the nation's bicentennial 
census. 



4 
THE PLANNING CYCLE K)R 1 990 






results that are uSS iSTS^ ** 198 also 
To the general public and . census planning. 

data, it may appea j t -" d *ny casual users of cens 



April l, i 990/ and e 69 " P ri to Census 



0* o* ^Pora-' 1113 "- f 



pretests, <* %*^ . afdfsM 
. The census Bureau's I 99 n r! lm Portant question 

xn spring 1 984 with t ests o f ^Sna*^ Pr 9ram be 5an 
methods zn several localities *, ^ ^ St coin Pilation 
?f e f/ Census, I 984b) . ^ le i r a o r p Und country (Bureau 
fielded in sprl 19 J 1 arge-scale pretests were 
1986 and 1 987 . Fina'liv T WUl alS be conducted 
Program will cul m i na te i n *{< * < reS * acch *"* testing 

" 



, 



h 
1986, and 1987 H P " " * P''ests in . 

. n ; t "J a t/ s f 6KS ' 1S - a <* -. ! 






are corrections of problems encountered in the field , not 
innovations in census procedures at that late date* This 
schedule not only compresses into a few years the oppor- 
tunities to test new methodology but also compresses the 
time available to evaluate the test results. The fact 
that tests are scheduled no more than a year apart makes 
it very difficult to complete the analysis of one set of 
tests in time to affect the design of the next set. 

In addition to the compressed time schedule for testing 
and research , two other critical factors affect the 
ability of the Census Bureau to modify census methodology: 
staff and budget resources. The Census Bureau has long 
been known for the high quality and dedication of its 
technical staff. The current budget for research on 
decennial census methodology, particularly for research 
on the undercount, is large by the standards of earlier 
censuses. Nevertheless, no agency of government, par- 
ticularly in the constrained world of the 1980s, can 
expect to have sufficient staff or resources to try out 
more than a few promising ideas and concepts. Pressures 
in the next few years to reduce the federal government's 
large deficit may make it more than usually difficult to 
Dbtain adequate staff and funding to carry out a thorough 
research and testing program for 1990. Hence, it is 
critical to designing the best census for 1990 and to 
Deing in the best position to plan further design changes 
:or 2000 that the Census Bureau make the most of the 
nesting opportunities afforded over the next few years 
md establish priorities for testing and research wisely. 



'HE IMPORTANCE OF CHOICE OF METHODOLOGY FOR 1990 

Dntroversy surrounding population censuses has as long a 
istory in the United States as census-taking itself, 
^cording to one review (Bureau of the Census, 1982a: 
?p. I lib: 73) , censuses conducted during the colonial 
jriod, generally at the direction of the Privy Council 
: the British Board of Trade, "were seldom regarded as 
unplete or successful, as people perceived them being 
>r the purposes of taxation or conscription and were 
asive and uncooperative." The decennial censuses 
nducted in the new nation had a constitutional mandate 
cording them legitimacy and support. Moreover, one 
storian (Conk, 1984:7) has noted that: "After the 
rst few censuses, Americans became increasingly 
terested in the census results . . . [which] showed 



that the population was growing steadily and ext emel 
rapidly. "* It quickly became evident in the ear] nin 
teenth century 9 however, that not all areas were shar 
equally in population growth and that reapportic ment 
based on census results meant substantial shifts in 
political power. Hence, as the same historian c mmen 
(Conk, 1984; 8) s 

It is not surprising therefore that nineteent 
century Americans who were pleased with the 
overall thrust of population change claimed tl it 
the census proved the virtue of the American \ iy 
of life or the American system of government. 
Conversely, those who felt shortchanged by 
re apportionment or were concerned about the 
tendencies of population change challenged bot 
the census and the apportionment system. 

The first extensive criticism of the census by sta 
ticians occurred in 1843 when the American S tat is ica 
Association (ASA) issued a lengthy report that do ime: 
glaring errors in the data on education, occupati i, 
especially the classification by race of persons ; len 
fied as insane, idiotic, and deaf and dumb. The \ 3A 
recommended that these results should be correctec or 
the least, disavowed. Problems with both under cov it 
fraudulent additions to the count were documented n ] 
early censuses (Bureau of the Census, 1982a:App.I3 b: 
81-83). 

Congress did not as a general rule respond dire tl 
these criticisms, although occasionally it acted t a 
the apportionment of the House when there was strc g 
evidence of gross deficiencies in the population c in 
Congress gave a third representative to Alabama in L8 
when the claim was made that the 1820 census omitt< 1 
counties and in 1860 awarded an additional seat to 
California because .of problems with the census in 1 ia 
state (Bureau of the Census, 1982a:App.IIIb:82) . r . te 
actions were politically much more palatable than s m: 
actions would be today, because reapportionment le< .s 
tion up through 1910 added representatives to accon o< 
population growth rather than allocating a fixed ni ib 
of seats among the states. 

Despite the questions raised about the populatic 
enumeration in the past, a review of decennial cens s 
history suggests that social and political forces h v 
converged in recent years to make the census in thi 



country and in other countries as well a matter of '* 

demons tr ably greater controversy than before. Several j 

factors are involved. | 

On one hand lie increased concern with the need to J 

protect the privacy of individual citizens and a sense ^ 

that the public is oversurveyed and less willing to '? 

respond to government inquiries. Indeed, in the last few ,* 

years, the level of public suspicion and hostility to J 

plans for the census caused the governments of several 

Western European countries to delay their census programs I 

or cancel them entirely (see Butz, 1984; Redfern, 1983) . I 

On the other hand, legislators have increasingly turned f 

to statistics in handling tough policy decisions, in \ 

fiscal 1984 9 federal grant- in-aid programs allocated at I 

least 380 billion to states and local areas via formulas r 

that depended in important ways on census figures (or J 

statistics based on census figures, such as current popu- j 

lation estimates) to determine who received how many dol- I 

lars (Office of Management and Budget, 1985) . As noted \ 

above, census data are used by constitutional mandate to 1 

determine the number of seats in the U.S. House of \ 

Representatives that are allotted to each state. They \ 

are used as well in drawing up congressional and state i 

and local legislative districts to meet rigid criteria \ 

for equitable representation of the population. Data j 

requirements for redistricting purposes in 1980 included ] 

census tabulations of the population by race (white? ! 
black; American Indian, Eskimo, and Aleut; Asian and 

Pacific Islander; other races) and Hispanic origin for \ 
each of several million city blocks in urban parts of the 
country and enumeration districts in unblocked areas 

(Bureau of the Census, 1982b:79). : 

In addition to these critical governmental needs, 

census data support many other major uses. Data from the \ 

latest census serve to document the social and economic j 

condition of the country as a whole and are the single ! 

most important source of information for small areas and \ 
groups in the population. Comparative information from 

successive censuses illuminates trends over time. Re- f 
searchers, planners, and decision makers in business, 
government, and academic institutions make use of census 
data for a wide range of important planning and analysis 
purposes. Just a few of the many uses to which census 
data are put (see Chapter 2 for a detailed review) 

include: i 



8 

Site selection for public service facilities i id 

commercial establishments based on evaluating :he 

socioeconomic characteristics of alternative 

locations? 

Transportation planning using detailed data or 

commuting flows; and 

Research into changing rates of population grc th 

for metropolitan versus nonmetropolitan areas nd 

different regions of the country. 

Many analyses based on census data have implicatio 3 
for the distribution of political power and wealth aim ig 
various population groups in the country. For exampl< - 
census data on the racial, ethnic, age, and sex makeup of 
occupational groups in labor market areas are used to 
assess the extent to which work forces reflect the cha ac- 
teristics of the local labor force. These data freque tl; 
form the basis of antidiscrimination lawsuits brought 
against employers. Census data on the makeup of the 1 ca 
population are used to assess and challenge the repr - 
sentativeness of grand and petit juries. Census data i 
earnings cross- tabulated by various characteristics ar< 
used to analyze wage disparities within and among occu] i- 
tions and important population subgroups. Findings frc i 
such studies can affect outcomes of public policy delii 
erations, such as the current debate over the issue of 
comparable pay for jobs of comparable worth. All of tfc s< 
uses have underscored more than ever before the importa c 
of obtaining a complete and accurate count of the popul - 
tion as well as accurate data about characteristics. 

Yet to obtain highly accurate data costs money. The 
1980 census cost close to 31.1 billion dollars about 
$4.75 for each inhabitant of the United States (Bureau . e 
the Census, 1983b:88) . The per capita amount is small 
compared with the per case cost of most government and 
private-sector sample surveys. Moreover, this total cos . 
includes planning, .collection, and processing activities 
that spanned most of a decade and provided data that are 
of value for the decade and beyond. Nonetheless, census 
costs exceeding 31 billion excite comment and invite 
close scrutiny to determine how they might be reduced. 
Recently in Canada, the quinquennial census scheduled fo 
1986 was cancelled because of budget constraints facing 
the government? it was subsequently reinstated, however, 
in response to widespread public expressions of concern 
and its demonstrated cost-effectiveness compared with 
alternatives. The U.S. decennial census is constitu- 



tionally mandated; nevertheless, pressures are likely to \ 

be severe in the coining years to attempt drastic cost | 

reductions both in census planning activities and in the i 

enumeration, despite the fact that, compared with other | 

ways of obtaining comparable information, the census is ^ 

still cost-effective. j 

The Census Bureau's own research has -shown that there | 

were inaccuracies in the 1980 census, both of underenumer- i 

ation (that is, persons who were missed) and overenumera- f | 

tion (that is, persons who were inadvertently counted | 

twice or otherwise included when they should not have f 

been) . Evaluation studies generally point to the con- 7 

elusion that, overall, the 1980 census produced a small l\ 

net undercount of the population that is, the census I 

count, including erroneous enumerations, fell somewhat J 

short compared with an independent estimate. Most I 

significantly, important race, sex, and age subgroups of * 

the population experienced differential rates of net | 

undercount. There is strong evidence that the black f 

population experienced a net undercount of about 5 J 

percent nationwide. Black men ages 25-54 appear to have \ 

iiad the highest net undercount rates close to 15 percent I 

on average (Passel and Robinson, 1984:Table 3). Coverage I 

estimates for whites and other races are difficult to ^ 

clerive because of the lack of reliable estimates of net | 

legal and illegal immigration. Making a range of reason- I 

^ble assumptions about the size of the undocumented alien If 

^population, it appears very likely that whites and other I 

xraces experienced net undercount in the 1980 census, but | 

that the rate of undercount was smaller, perhaps signif i- r 

oantly smaller than the 1.5 percent rate experienced in I 

1970 (see Passel et al. , 1982:6-8? see also the review in J 

Chapter 5 of coverage estimates for censuses from 1950 | 

through 1980) . f 

Differential net undercount means possible inequities 
In redistricting, fund allocation, and provision of 
social services based on census data as well as possibly 
erroneous conclusions drawn from studies used as the 
Dasis for antidiscrimination policies and lawsuits and 
>ther socially important purposes. The belief that 
errors in the census affected representation and fund 
allocation gave rise to an unprecedented number of 
Lawsuits following the 1980 census. By October 1981, 
>ver 50 suits had been filed challenging the census 
results (Bureau of the Census, 1983b:85). Currently, the 
fudge assigned to a major case in which the State and the 
-ity of New York are suing to have the Census Bureau 



10 

adjust the 1980 census counts is reviewing testimony and 

preparing to hand down a decision! 23 other cases are 
awaiting settlement of the New York suit. Analyses by 
Kadane (1984) and Gilford (1983) indicate that the appor 
tionment of congressional seats may have been affected k 
the differential undercount. For example* Kadane found 
that if one of the sets of estimates produced from the 
1980 Post-Enumeration Program evaluation were used to 
adjust the census results , California would have receive 
an additional seat at the expense of Pennsylvania. 



PROPOSED CHANGES IN METHODOLOGY 

Not surprisingly, many ideas have been proposed by the 
Census Bureau and others to improve the decennial census 
Some are directed principally at improving coverage and 
reducing differential coverage errors. One idea in this 
class is to match administrative records , such as driver 
license lists and other sources, against the census on a 
scale even larger than that used in 1980 to identify 
people who should be followed up to determine if they 
were improperly omitted from the census count. (See 
Chapter 3 for a description of the efforts along these 
lines in the 1980 census.) Other ideas are directed 
principally at reducing costs* One such approach is to 
make use of sampling, not only to obtain information on 
characteristics, as is currently standard decennial 
census practice, but also as part of the procedure to 
obtain the count. For example, one could attempt contac 
with a sample of households that do not mail back their 
questionnaires, rather than all nonrespondents, in the 
follow-up stage of census operations. Special coverage 
improvement procedures could also be carried out on a 
sample basis. 

Two important themes stand out in current discussions 
of methodology for the decennial census. One relates to 
the degree of emphasis that should be given to counting 
versus estimation. A census, no matter how diligently 
administered, can never be complete or without error. 
Moreover, it is true of current census methodology that 
not every record corresponds to a person actually named 
on a questionnaire; for example, a small percentage of 
records (about 1 percent in recent censuses see Chapter 
3) represents imputations in situations in which there i 
good evidence that a housing unit is occupied but repeat 
efforts have failed to find the residents. Hence, a 



11 

isus, strictly speaking, provides an estimate of the 
pulation. 

Prom this recognition has come a view of the decennial 
Dcess that emphasizes the role of estimation and argues 
at some of the resources for conducting the census 
Duld be shifted from efforts directed toward traditional 
verage improvement procedures to efforts directed toward 
/eloping the best possible estimates of the total popu- 
tion and subgroups. Input to the decennial year popu- 
tion estimates, in one version of this view (Ericksen 
d Kadane, 1985), would include not only a well-conducted 
nsus, but also information obtained from various pro- 
ams conducted on a sample basis that would provide a 
sis for adjusting the census field counts. (Such pro- 
ams might include matching of samples of administrative 
sts to census records and follow-up of a sample of 
useholds that did not respond to an initial follow-up 
fort.) Whatever the merits of particular suggestions 
t forward to incorporate estimation into the census 
Dcess, the known errors and the incompleteness of the 
nsus count mean that the issue of adjusting census 
gures needs to be addressed. 

The other theme relates to the critical importance of 
aluation programs in the methodology of the decennial 
nsus. Politicians, policy analysts, statisticians, 
onomists, demographers, other social scientists, and 
ers of census data in all sectors have expressed diver- 
nt views regarding the most appropriate methodology for 
nducting the census. But whether they view the census 

traditional terms as strictly a counting operation or 
lieve that the census should be the starting point for 

estimating process, there is substantial agreement on 
e importance of evaluating the completeness and accuracy 

census statistics. 

The Census Bureau has conducted formal evaluation pro- 
ams for every census since 1950 (Bureau of the Census, 

date-a) . All of the techniques used to date in this 
untry and abroad, including demographic analysis, 
verse record checks, administrative record matches, and 
st-enumeration surveys (whether recanvassing selected 
eas or matching independent surveys to census records) , 
ve important flaws (see the review of coverage 
aluation methods in Chapter 4). In the united States 
day, the absence of adequate data for estimating net 
migration, whether of legal or illegal residents (Marks, 
80) , poses particularly severe problems for evaluating 
.e census count even at the national level using the 



12 

demographic method* Furthermore^ if evaluation 
were to be used for census adjustment purposes, 
reasonably accurate information on the errors of e^ 
tion estimates would also be needed* Nevertheless i 
concern over possible inequities in political 
tion and the distribution of large amounts of 
dollars as well as concern over the adequacy of tine 
for analysis of the socioeconomic status of imports 
population groups , there has never been a greater n 
for thorough evaluation of the decennial census. 1 
evaluation is necessary whether the object is to in 
users of known errors in the census or actually to 
census results. 

While there is widespread agreement that evaluat 
important and that the issue of adjustment must be 
many decisions on methodology for 1990 remain to fc>e 
made* It is clear that there is no lack of ideas su 
suggestions that appear useful to investigate. Xt 
also clear that the process of determining a reason- 
methodology for 1990 will involve difficult choices 

Ideally, one would like the 1990 and future censi 
to improve the accuracy of the data over that in 19 
maintain the amount of useful information collected 
release the results on a more timely basis, while a 
same time reducing the burden on the public and low* 
costs. The Census Bureau has stated (Bailar, 1984s: 
that its minimum goals for 1990 are to: 

(a) Conduct the 1990 Census without increasing tl 
per -housing-unit cost in 1980 dollars. (b) 
Expedite the availability of the data to users. 
(c) Maintain a high rate of overall coverage an<3 
improve the accuracy of small area data while 
reducing the overall differential for population 
groups and geographic areas. 

It may be possible to design a methodology that make 
gains in the desired direction on each of these dime 
sions. The more likely situation is that it will fce 
possible to make progress on one or two dimensions t 
the price of giving up improvements on the others. 
Explicit trade-offs reflecting costs and benefits wi 
need to be made in the choice of methodology for 199 
beyond (see Keyfitz, 1979). Because of the high cos 
censuses and the compressed time frame within whicli 
are carried out, making mid-course corrections impos 



13 

it 5 essential that the methodology to be used be 
tho Dughly tested. 



IND PENDENT REVIEWS OF DECENNIAL CENSUS PLANS 

The Census Bureau is actively working on methodology for 

the L990 census and has assembled staffs to plan the 

cen is and specifically to work on issues of undercount 

and the possible adjustment of census counts. For many 

dec les, the Census Bureau has also actively sought 

out ide independent review of its plans and proposed 

pro idures. In addition to ongoing advisory committees 

inv Lving various professional disciplines and advisory 

com Lttees representing the interests of population 

gro ps for whom census results are particularly impor- 

tan r the Census Bureau has asked the National Research 

Cou 2il (NRC) and the American Statistical Association 

(AS ) to conduct special studies of the decennial census. 

Thi report represents the fourth outside review conducted 

in scent years of key aspects of modern census method- 

olc y. A brief discussion of the scope and thrust of the 

pre scessor NRC and ASA studies can help place the 

cur ent study in context. 



The 1969-1972 NRC Advisory Committee on Problems of 

Cen is Enumeration 

The Bureau of the Census sponsored a study in 1969 by a 

com ittee of the National Research Council to provide 

adv ce on ways to improve completeness of coverage in the 

dec inial census and intercensal household surveys. (The 

Off ce of Economic Opportunity and the Manpower Admin is- 

tra ion of the U.S. Department of Labor also contributed 

SUE ort for the study.) The Advisory Committee on 

Pro Lems of Census Enumeration issued its final report, 

Ame ica's Uncounted People, in 1972. The report focused 

on le need to understand the social and psychological 

con ext in which undercount occurs. For example, the 

com Lttee noted that people may be missed in central city 

are s because, although members of extended families, 

the are not attached to a family or household residence, 

whi h is the basic unit of enumeration in the census and 

hou ahold surveys. The committee strongly recommended 

tha the Census Bureau broaden its research strategy and 

kno Ledge base to include methods and concepts not 



14 

typically embraced in survey research* The report 

included specific recommendations to conduct experimental 
studies of questionnaire wordings and formats and their 
effects on respondents ; explore the utility of communica- 
tion research for better understanding the reasons for 
census and survey under cover age ? and carry out localized 
participant-observer studies to learn more about the 
impediments to census data collection in different kinds 
of areas* 



The 1978 NRC Panel on Decennial Census Plans 

The Census Bureau asked the National Research Council 
again in 1978 to review decennial census methodology , 
specifically the plans for the upcoming 1980 census. The 
NRC's Committee on National Statistics set up the Panel 
on Decennial Census Plans, which worked within a very 
short time span, to: (1) examine field procedures, 
questionnaire design, and special procedures designed to 
improve the 1980 census coverage, (2) review proposed 
procedures for handling contested counts, (3) investigate 
the feasibility of adjusting census counts, and (4) con- 
sider evaluation plans for the 1980 census. The panel's 
report, Counting the People in 1980: An Appraisal of 
Census Plans, made recommendations in many areas. This 
panel repeated the call of the earlier committee for 
imaginative work on the cultural and social problems 
associated with census- taking. In the area of adjustment, 
the 1978 panel concluded (National Research Council, 
1978:132-133) that: "methods of adjustment with tolerable 
accuracy are feasible" and "on balance an improvement in 
equity would be achieved." The panel supported imple- 
mentation of procedures to adjust population counts for 
under enumeration for purposes of fund distribution and 
expressed confidence in the Census Bureau to determine 
the best technical procedures for adjustment. The panel 
recommended that adjustment "not be applied to the counts 
used for legislative apportionment nor to the body of 
census data on the characteristics of the population." 



The 1981-1982 ASA Technical Panel on the Census Undercoimt 

The Census Bureau asked the American Statistical Associa- 
tion in 1981 to convene an expert group to review the 
methods and results of the programs used to evaluate 



15 

npleteness of coverage in the 1980 census and to make 
commendations regarding research in the areas of cover - 
2 evaluation and adjustment of census counts. This 
nel made a number of specific research suggestions and 
so recommended (ASA, 1984:256): "that the Bureau of 
B Census sponsor an outside technical advisory group on 
ler count estimation and related problems." 



5 1984 Panel on Decennial Census Methodology 

response to the recommendation of the 1981 ASA panel, 
5 Census Bureau asked the Committee on National Statis- 
;s at the National Research Council to establish the 
lei on Decennial Census Methodology. The charge to the 
lei was for an investigation of three major issues from 
technical viewpoint, setting aside legal 
isiderations: 

(1) Adjustment of census counts and characteristics, 
including exploration of formal criteria to 
evaluate measures of under count and alternative 
adjustment procedures; 

(2) Uses of sampling in the decennial census, 
including investigation of whether, for a given 
cost, the sampling of lists and areas to improve 
coverage and sampling of nonrespondents for 
follow-up can improve accuracy for the total 
population and for important subgroups; 

(3) Uses of administrative records, including inves- 
tigation of various types of records to determine 
their possible utility in improving the accuracy 
of census counts and the efficiency of census 
operations. 

We interpreted this charge to include investigation of 
>sely related topics, notably methods of coverage 
iluation and improvement. Coverage evaluation programs 
wide the necessary input data for any adjustment and 
:ve the important function of apprising users of the 
ility of the census counts. Procedures for coverage 
movement are viewed by the panel as important and 
irable even if an adjustment procedure is incorporated 
:o census methodology. The panel also investigated 
s of census data and their dependence on the accuracy 
the census figures. Proper evaluation of the conse- 
nces of changes in collection methodology requires an 



16 

understanding of important uses of the data being 
collected . 

The charge to the panel related to analysis of decs 
nial census methodology and not to other population 
programs of the Census Bureau. However , in the course 
the panel 8 s work, it became clear that the census coul 
not be considered completely in isolation. Demographi 
and related social and economic statistics are used co 
tinually over the decade following each census, and 
current information is needed for these uses. The Cen 
Bureau has a number of formal programs for updating so 
of the census information. Hence f the census is the 
central part of a broader statistical system designed 
produce data needed to implement legislation, assist i 
decision making both by industry and government, and h 
understand changes taking place in our society. Altho 
the panel did not undertake a study of population stat 
tics programs other than the census, we did consider t 
quality of census data compared with the quality of pc 
censal population estimates. The panel makes a recomm 
dation to assess the need for a mid-decade census in 1 
in light of the impacts of errors in postcensal popula 
estimates on major data uses, such as fund allocation 
(see the discussion in Chapter 2) . 

The work of the panel differs in several important 
ways from the efforts of its predecessors. This is th 
first panel to be asked explicitly to consider importa 
changes in decennial census methodology from the persp 
tive of cost as well as effectiveness. A theme runnin 
through the charge to the panel is to design a methodo 
that improves accuracy compared with previous censuses 
but costs no more, and ideally less, in constant dolla 

Other important differences have to do with the tim 
of the panel's work in relation to the cycle of decenn 
census planning. The panel was convened at a point in 
the cycle when it could benefit from the availability < 
extensive material regarding the experience in the mos 
recent census. At the same time, the panel carried ou 
its work in an early stage of the planning cycle for t 
next decennial census before decisions on methodology 
were fixed. Hence, the panel has been in an unusually 
good position to provide suggestions and guidance 
regarding the research and testing program for 1990. 
fact, the panel's role has been one of assessing and 
reacting to Census Bureau research and pretest plans f 
1990 rather than attempting to recommend, at this stag 
in the process, the adoption of specified procedures a 



17 

( >ncepts as the methodology for the 1990 census. The 
j inel does not presume to have the answers regarding the 
1 est" methodology for the decennial census. We have 

< ideavored to state our position and to recommend direc- 
1 .ons for needed research on critical issues of what 

< >nstitutes cost-effective methodology, particularly the 
] slative emphasis to be given to counting versus 

< stimation. 



I UOR THEMES OF THE REPORT 

J sveral themes run through this report. The first major 
1 leme can be expressed as the need for balance between 

raditional and new procedures in the choice of census 
i jthodology for 1990. Indeed, balance has characterized 

le historical evolution of decennial census methodology. 
' ie panel does not propose that the Census Bureau make 
; idical innovations in decennial census methodology in 
i ie near term. The census is a massive and complex 

< Deration and major changes should be made only with care 
i id after thorough evaluation including tests carried 

< at under actual census conditions. Nonetheless, the 

] inel believes that it is important to implement changes 

< i some dimensions for 1990 and to undertake planning 
1 tat may lead to further changes in the future. 

Most important, the panel argues for balance between 

< : forts to achieve a complete enumeration and efforts to 
nprove the accuracy of census figures through adjustment 

1 ocedures. The panel believes that adjustment cannot be 

1 Lewed as an alternative to obtaining as complete a count 

< ; possible through cost-effective means. The United 
J :ates has a long tradition of a census as a complete 

< lumeration in which it is a civic responsibility to 

] irticipate in the census process. The panel believes it 
: & important to continue this tradition and important 
lat census methodology should strive for a complete 

< lumeration via counting procedures, including the use of 

< :>st-effective special coverage improvement programs* 

1 >wever, the panel also believes that the ultimate goal 

< E the census should be the accuracy of the census 

: .gures. The evidence is overwhelming that no counting 
1 rocess, however diligent, will in fact enumerate every- 

< ie. Hence, the panel recommends that the Census Bureau 

< irry out a vigorous program of research on coverage 

'aluation and adjustment methods that, if successful, 
i >uld permit adjustment of census figures as part of the 



18 

methodology for the 1990 census (see the discussion i 
Chapters 7 and 8). 

A second and related theme concerns cost-effective] 
The panel has not attempted to apply formal cost-beneJ 
analysis to decennial census methodology, but has end' 
ored to identify those proposed changes that show the 
most promise of improving accuracy without increasing 
costs or of reducing costs without importantly impair: 
accuracy* In this regard, the panel's recommendation 
research designed to develop appropriate and feasible 
methods of adjustment of the census counts, together \ 
the Census Bureau's stated goal to contain costs for 1 
1990 census, implies that some budget resources must 1 
shifted from coverage improvement to coverage evaluat] 
and adjustment. Specifically, the panel argues that 
coverage improvement programs used in previous census* 
should be carefully reviewed to determine their effic< 
Costly programs that neither correctly added signifies 
numbers of people to the count nor importantly reducec 
differential undercount should be dropped from the Cer 
Bureau's plans for 1990 (see the discussion in Chaptei 
5). Effective programs, however, should be further 
refined through testing and research, and the budget 
should make room for testing some new ideas in this ar 

While not favoring extensive use of sampling to obi 
the count, the panel supports research on using sampli 
in the later follow-up stages of census operations anc 
some coverage improvement programs, such as the progra 
to recheck the vacancy status of housing units. Limit 
use of sampling may effect measurable cost savings wit 
minimal sacrifice of accuracy (see the discussion in 
Chapter 6) . Careful use of sampling for certain cover 
improvement programs may, in fact, improve accuracy bj 
reducing duplications and other erroneous enumerations 
in addition to identifying missed households and peop] 

In considering cost and accuracy, the panel believe 
it is important to look at the characteristics data 
collected in the census as well as the population coun 
There is strong evidence that important subject items 
have severe reporting problems. The panel recommends 
strategy of looking closely at each item proposed for 
inclusion on the questionnaire to determine: (1) the 
need for that item, (2) the level of geographic detai] 
required by users, and hence whether the item must be 
asked of all households on the short form, or whether 
can be asked of a sample on either the long form or on 
much smaller follow-on survey, and (3) whether some ot 



19 

source could provide higher -quality data. The panel 
suggests exploring the use of administrative records 
together with sampling to obtain data on some housing 
structure characteristics (see Chapter 6). Such data 
could be more accurate than individual responses on the 
census form. Costs initially may be high, but should 
decline over time. This particular use of administrative 
records has the advantage that it should present no 
actual or perceived threat to individual privacy. 

A third major theme of the report concerns the 
strategy for designing the 1990 census, whatever the 
particulars of the methodology may turn out to be. The 
research plans drafted by the Census Bureau staff are 
extremely comprehensive and ambitious. The staff has 
clearly tried to include all reasonable ideas for con- 
sideration in the research and testing program. The 
panel commends the Census Bureau's efforts to design and 
carry out a thorough research and testing program that 
will support sound decisions regarding methodology for 
the 1990 and later censuses. 

The panel believes, however, that in most areas the 
Census Bureau must choose among all the ideas and pro- 
cedures proposed for testing, given constraints on 
available staff and budget resources and the limited time 
available to analyze test results and use them to guide 
decisions on methodology. The exception concerns research 
related to adjustment, including research on coverage 
evaluation methods. In this area, the panel believes 
that research must proceed on a broad front if effective 
methodologies are to be developed for 1990. in other 
areas, the panel has endeavored to recommend strategies 
for choosing priority projects for inclusion in the 1990 
census research and testing program and has also recom- 
mended the use of less costly research methods, where 
appropriate, including more detailed analysis of 1980 
census results, in place of full-scale field tests. 
Finally, the panel recommends specific concepts and 
procedures for research and testing that we believe show 
special promise for improving the methodology of the 
decennial census in 1990 and beyond. 



DVERVIEW OF THE REPORT AND RECOMMENDATIONS 

!This section provides an overview of the report and a 
summary of the panel's recommendations. The report is 
organized as follows: Chapters 2,3, and 4 provide 



20 

background on the decennial census that is 
understanding the subsequent exposition of the pa 
recommendations. Chapters 5 through 8 provide ge 
and specific recommendations in several areas. E 
chapter includes one or more appendixes that pxrov 
additional details for the interested reader on t 
discussed in the text. Below we briefly summaxriz 
contents of each chapter and list the panel's 
recommendations . 



Chapter 2 Purposes and Uses of the Decennial Cen 

In choosing an appropriate and cost-effective met] 
for a data collection program, it is important to 
stand the kinds of uses that are made of the stat 
The chapter gives a brief overview of the uses of 
data historically and then describes the range of 
the modern census; most importantly , the census i 
tually the only source of comparable data on basi< 
and detailed characteristics for small areas anx3 i 
groups in the population* Two appendixes provide 
tional information on uses and users of census cla 
Appendix 2.1 reviews state and local government ut 
Appendix 2.2 provides a case study of government, 
business , and academic uses in the State of New J< 
The chapter includes a review of the limited b< 
research that has attempted to measure the effect* 
census data errors on key uses, such as reapporti< 
redi&tricting, and allocation of federal funds to 
and localities. Finally, the chapter reviews dat< 
errors in postcensal population estimates for sma] 
which appear to greatly exceed errors in the censi 
itself. The panel recommends serious consideratic 
the need for a mid-decade census program in 1995 1 
improve the quality of postcensal estimates: 

2.1. We recommend that the Census Bureau asses 
need for a mid-decade census , particularly by * 
the effect of errors in postcensal population < 
compared with errors in the decennial census or 
data uses. Unless these studies do not suppori 
value of a mid-decade census, the Census Bur eat 
proceed with preparations and make every effort 
secure funding to conduct a census in 1995. 



21 

2hap er 3 Census Methodology: Prior Practice and 
dun nt Test Plans 

This chapter briefly reviews the procedures used to 
sond ct the 1980 census and compares and contrasts the 
L98C census methodology with procedures used in previous 
node n censuses in the United States and in other Western 
cour ries. The discussion covers the following stages in 
the ensus process: development and checking of address 
list , enumeration, follow-up, coverage improvement, data 
proc ssing, and post-census evaluation. The chapter also 
sumn rizes the Census Bureau's current research and test- 
ing lans for the upcoming 1990 census, with particular 
smph sis on the plans for pretests in spring 1986, and 
pres nts the panel's overall assessment of these plans. 

T e panel has several major concerns with the research 
*nd esting program proposed for 1986 . The program out- 
Line appears too ambitious for the time remaining before 
ihe snsus and for the staff and budget resources likely 
to b available, particularly if key data are to be 
inal zed in time to support major decisions* In the 
pane 's view the program also places too much emphasis on 
:iel testing over other kinds of research, including 
Eurt er analysis of existing data. The panel suggests in 
:his chapter some ways to scale back the 1986 testing 
prog am and in subsequent chapters provides detailed 
reco; nendations for research priorities in specific areas 
>f c nsus methodology. 

3_ 1. We recommend, to ensure cost-effective field 

t< sting and preservation of adequate resources for 

a alysis, that the Census Bureau attempt to identify 

r< search and testing proposals for 1986 that: 

( ) Can be pursued with other research methods and 
omitted from the 1986 field test program? 

( ) Can be safely deferred for research or testing 
until 1987 or until the dress rehearsals? 

( ) Are unlikely to be viable for 1990 but should be 
incorporated on an experimental basis into the 
1990 census as a test for future censuses; and 

(< Should be omitted entirely from consideration for 
the 1990 census, based on previous census 
experience or other survey research results. 

3_ I. We recommend that the Census Bureau make full 
ui j of data from the 1980 census and from experiments 



22 

carried out in 1980 to help guide planning :or IS 
To this end, we recommend that the Census 1 ireau 
assign a high priority to completion of 196 cens 
methodological studies, and we encourage fi ther 
analysis of these data where appropriate. 

Appendixes to Chapter 3 provide more detail d exj 
tions of two aspects of current census methodo ogy 
sequential hot-deck imputation, used to assign value 
census records for missing responses (Appendix 3.1), 
iterative proportional fitting, used to calibr te 
responses obtained from samples of households -> res 
obtained from the entire population (Appendix .2). 
techniques have potential use, as discussed in :hapt 
in carrying out an adjustment of the census coi its. 



Chapter 4 Evaluating the Decennial Census: Pa t 
Experience 

This chapter reviews the history of coverage ev luat 
of population censuses in the United States fro 195 
through 1980* Broadly speaking, there are two ajoi 
classes of coverage evaluation techniques micr< -lev 
methods based on case-by-case analysis of sampl s oi 
units such as persons or households and macro-lc r el 
methods involving analysis of aggregate census < ita, 
including comparison of census totals with extei ial 
and analysis of internal consistency. The chapl *r 
identifies strengths and weaknesses of each of t e in 
micro- level methodologies used in the United Stc es 
Canada, including post-enumeration surveys, reve se 
checks, and administrative records matches, and f t 
major macro-level method, demographic analysis. Alt 
the chapter does not contain recommendations, it doe 
provide important background material for the pa: si 1 
conclusions presented in Chapter 8 regarding res arc 
coverage evaluation estimation methods for the l! >0 
census. Chapter 4 includes two appendixes provi< Ing 
technical details on coverage evaluation based 01 
multiple lists (Appendix 4.1) and on operational isj: 
and modeling of computer matching (Appendix 4.2) , 
Matching of records from the census to records ol :ai 
from one or more surveys or administrative lists s 
critically important component of most methods ol 
evaluating the completeness of coverage of the pc ul 
obtained in the census. 



23 
Chapter 5 Taking the Census Is improving the Count 

This chapter focuses on the problem that not all popula- 
tion groups are counted equally well in the census and 
discusses procedures for improving the county including 
procedures used in past censuses together with some new 
ideas. Most programs directed toward coverage improvement 
are expensive. They may also introduce error by dupli- 
cating or otherwise erroneously adding persons, in 
general, however, the panel believes that the costs of 
well-designed and well-executed coverage improvement 
programs represent money well spent for improving the 
census figures. The chapter first reviews what is known 
about hard-to-count groups in the population and about 
groups that have exhibited problems of erroneous enumera- 
tions. (The text provides a summary of the literature, 
and Appendix 5.1 provides a more in-depth review.) The 
chapter then reviews the performance of special programs 
directed toward coverage improvement in the 1970 and 1980 
censuses, including estimates of cost and numbers of 
persons and housing units added to the count. The panel 
makes recommendations for coverage improvement related to 
items on the questionnaire and to enumeration procedures. 

The panel first notes the importance of gaining 
understanding of the problems of undercount and overcount 
in the census. 

5.1. We recommend that the Census Bureau assign a 
high priority to the completion of studies of 
undercount and overcount in the 1980 census. 

5.2. We recommend that the Census Bureau set up a 
timetable and assign staff to permit completion of the 
analysis of 1990 coverage evaluation results in time 
to be used in planning the first pretest of the 2000 
census. 

The panel next discusses priorities for research and 
testing directed toward improvement of items on the ques- 
tionnaire that relate to coverage, including the questions 
on race and Hispanic origin. It is important to under- 
stand what responses to the race and ethnicity questions 
mean if appropriate estimates of coverage rates for race 
and Hispanic groups are to be developed. This section 
reviews the history of race and ethnicity questions in 
the census, considers techniques for developing race and 
ethnicity questions, and discusses issues of data com- 



24 

par ability, including comparability of race and eth Lc 
data from census to census and comparability with 
information collected in vital statistics records. 

5.3* We recommend that the Census Bureau test a 
variety of question designs for the race and eth] Lc 
information to be collected in the 1990 census , 
including some that combine the collection of in: >r] 
tion on Hispanic origin with the other race and 
ethnicity information. 

5.4. We recommend that the Census Bureau, in ad< .t 
to other methods that it has traditionally emploj d 
use the technique of focus group discussions as < ie 
means to develop questions on particularly sens it v< 
items such as race and ethnicity. 

5.5. We recommend that, in 1990 as it did in 198 , 
the Census Bureau collect, tabulate, and release a 
on race and ethnicity in such a way that the data c< 
be reaggregated as necessary to obtain maximum 
feasible comparability with 1980 and 1970. 

5.6. We recommend that the Census Bureau, the 
National Center for Health Statistics, and other 
relevant federal agencies work closely together t 
design questions and response editing rules on ra e 
and ethnicity that minimize conceptual difference 
between census and vital statistics records to th 
extent feasible. The Office of Management and Bu g< 
should act as necessary to facilitate such 
coordination. 

The next section of the chapter reviews experienc ; 
the 1970 and 1980 censuses with questions on the shoj : 
form designed to aid in achieving a complete and ace :< 
count, such as questions probing for a complete rost< : 
household members* The discussion notes problems po; * 
for an accurate count by the mobility of the populat: >i 
and recent trends in living arrangements that have 
resulted in growing populations with two or more usuc . 
residences (for example, retired people with summer < i< 
winter homes) . The panel suggests a question for tes .J 
directed toward improving coverage of young adults ai I 
children in hard-to-count areas. 



25 

5.7. We recommend that the Census Bureau give high 
priority in its planning for 1990 to research and 
testing of questions and enumeration procedures that 
address problems of accurately counting persons in the 
process of moving , households with second (vacation) 
homes 9 and persons with more than one usual place of 
residence. 

5.8. We recommend, as one procedure to consider for 
improving coverage of hard-to-count groups , that the 
Census Bureau pretest a question asking parents for 
names and addresses of children who are not part of 
the household. This question should be included in 
the 1986 pretests. 

The last section of the chapter provides the panel's 
overall assessment of special enumeration procedures 
designed to improve the count. While believing that 
programs such as the recheck of vacant units can make 
important contributions to improving coverage, the panel 
does not subscribe to the view that every coverage 
improvement idea that is suggested or has been used in 
the past should be included in the plans for the next 
census. The panel recommends paring down the list of 
programs to be considered for 1990 and the list requiring 
early field testing. 

5.9. We recommend that the Census Bureau review 
coverage improvement programs used in past censuses 
and proceed with research and testing directed toward 
use in 1990 of those programs that: (1) exhibited a 
high yield in terms of numbers of missed persons cor- 
rectly added to the count and/or contributed signifi- 
cantly to reducing differential under cover age, (2) 
exhibited low- to-moderate costs per person correctly 
added, and (3) did not add many persons incorrectly. 
Programs that do not satisfy these criteria should be 
dropped from consideration unless: (1) the program 
exhibited low total dollar costs and had demonstrable 
public relations or goodwill value in previous censuses 
or (2) there is some particular reason to believe a 
revised program will yield greatly improved results. 

5.10. We recommend that the Census Bureau conduct 
full-scale pretests in 1986 only of those coverage 
improvement programs that require such testing. 
Furthermore, we recommend that the Census Bureau use 



26 

focus groups that include members of hard-to-count 
populations as one means to explore coverage improve- 
ment techniques and to narrow the range of options to 

be field- tested. 



Chapter 6 Taking the Census Us Uses of Sampling and 
Administrative Records 

This chapter addresses two major methods that have been 
proposed to improve the cost-effectiveness of the decen- 
nial census-the uses of sampling in obtaining the count 
and the use of administrative records. The chapter con- 
siders the merits of replacing the census with a large 
sample survey, using sampling in the follow-up stage of 
census operations, and using sampling for various cov- 
erage improvement operations. It also discusses the 
traditional use of sampling to obtain characteristics 
detail. Finally, the chapter considers the use of 
administrative records along with sampling for improving 
the quality of certain census items. 

With regard to sampling for the count, the discussion 
notes problems of replacing the census with a large 
sample survey. The panel believes that a survey would 
result in less complete coverage compared with a census 
and that there would be only minor cost savings in 
sampling on the scale necessary for satisfaction of 
present demands for small-area data from the census. The 
use of sampling for follow-up of households that do not 
return their census questionnaires has some of the same 
drawbacks, but sampling could prove cost-effective in the 
final stages of follow-up in which it is very expensive 
to count an additional person. Although the Census 
Bureau has dropped plans to study the use of sampling for 
follow-up and for coverage improvement programs such as 
the recheck of vacant units in 1986, the panel supports 
research in these areas. The panel also supports further 
testing of telephone follow-up of nonresponding house- 
holds, which was tried experimentally in 1980. Finally, 
the panel discusses the need to maintain machine-readable 
records of the follow-up history of individual households 
that will permit detailed analysis and simulation of 
different sample designs. 

6.1. We recommend that the Census Bureau not pursue 
research or testing of a sample survey as a 
replacement for a complete enumeration in 1990. 



27 

6_.2_. We recommend that the Census Bureau include the 
testing of sampling in follow-up as part of the 1987 
pretest program. We recommend that in its research 
the Census Bureau emphasize tests of sampling for the 
later stages of follow-up. 

6 .3.* We recommend that the Census Bureau keep machine- 
readable records on the follow-up history of individual 
households in the upcoming pretests and for a sample 
of areas in the 1990 census, so that information for 
detailed analysis of the cost and error structures of 
conducting census follow-up operations on a sample 
basis will be available. 

6.4. We support the Census Bureau's plans for further 
testing of telephone follow-up procedures in 1986. We 
recommend that the Census Bureau review the implica- 
tions for sample-based follow-up operations of the 
operational difficulties that were encountered in the 
1980 telephone experiment. 

6.5. We recommend that the Census Bureau consider the 
use of sampling for those coverage improvement 
programs that are implemented in the final stages of 
census operations and where there is potential for 
significant cost savings* We recommend that the 
Census Bureau simulate sampling in the Vacant/Delete 
Check program in an upcoming pretest. 

The chapter then reviews the use of sampling for 
ontent items in the census. Historically, in every 
ensus since 1940, some items have been asked of only a 
ample of the population in order to reduce response 
urden and processing costs while obtaining the benefits 
f additional data. Sample designs and sampling fractions 
ave differed in recent censuses. The Census Bureau is 
urrently considering a design for 1990 that would 
nclude a short form containing items asked on a 100 
ercent basis, a long form containing additional items 
sked of a large sample, and a follow-on survey of a 
mall percentage of short-form households administered 
ithin a few months of Census Day that would obtain yet 
ther information. The panel did not offer specific 
ecommendations in this area, but noted that the criteria 
or including items in the follow-on survey have not been 
xplicitly articulated but should be to permit thorough 
ssessment of the need for the survey and for the 
nclusion of particular items. 



28 

66 We recommend that the Census Bureau refine and 
make more explicit its criteria for inclusion of items 
in the proposed follow-on survey that is being 
considered for the 1990 census. 

The last section of the chapter discusses the use of 
administrative records and sampling for improving the 
accuracy of content items. The concern over completeness 
of population coverage in the census can obscure equally 
valid concerns over the accuracy of the content. There 
are well-documented problems with the reporting of content 
items such as income, utility costs, and age of struc- 
ture. The panel recommends research and testing directed 
toward improving the data quality of key items. The 
research program should include design of operations to 
verify, and possibly adjust, responses as part of the 
census operation and investigate the possibility of 
obtaining some items from administrative records sources. 
The panel makes a specific recommendation with regard to 
housing structure items. 

6.7. We recommend that the Census Bureau conduct 
research and testing in the area of improved accuracy 
of responses to content items in the census, we 
recommend further that the content improvement pro- 
cedures examined not be limited to reinterviews of 
samples of respondents, but include the use of 
administrative records. 

6 8 . We recommend that the Census Bureau investigate 
the cost and feasibility of alternative ways of 
obtaining data on housing structure items. Possi- 
bilities includes (1) obtaining housing structure 
information on a sample basis from administrative 
records and using this information to verify and 
possibly to adjust responses in the census; (2) 
obtaining structure information solely from adminis- 
trative records and dropping these items from the 
census; and (3) asking structure questions of a 
knowledgeable respondent such as the owner or resident 
manager. We recommend that any trial use of a 
"knowledgeable" respondent procedure include a check 
of the data obtained from such respondents against 
data from administrative records. 

Two appendixes to Chapter 6 provide additional infor- 
mation related to use of sampling to obtain the census 



29 

c ant. Appendix 6.1 gives very rough cost estimates for 

c nducting the census on the basis of different-sized 

s tuples ? Appendix 6.2 develops illustrative costs for 

c nducting follow-up operations of a sample of nonrespon- 

c nts. Appendix 6.3 develops in further detail the 

I nel's suggestion for obtaining improved data on housing 

s ructure items by means of local administrative records. 



( apter 7 Adjustment of Population Counts 

] this chapter, the panel presents its basic position on 
i e issue of adjustment of the census counts. The chapter 
c nsiders criteria or yardsticks for measuring the in- 
c ease in accuracy of census information that adjustment 
n ght produce and addresses problems of consistency and 
t raing posed by adjustment. The chapter discusses what 
j known and recommends further research on procedures 
1 at could be useful for adjustment* Three appendixes 
E ovide additional technical discussion. Appendix 7.1 
s pplies the mathematical expressions for various yard- 
s icks discussed in the texti Appendix 7.2 discusses in 
c eater detail hierarchical Bayesian techniques that have 
t en proposed for use in adjustment; and Appendix 7.3 
c scusses a problem raised by the aggregation of synthetic 
timates. 

The chapter begins with consideration of the need for 
c justment to improve the accuracy of the census numbers , 
I rticularly to reduce differential coverage errors across 
c ographic locations and demographic groups. The panel 
] led to recommend development of adjustment procedures, 
fc t as a complement to not a substitute for continued 
forts to improve census coverage, if public perception 
c the importance of being counted should deteriorate, 
1 is would have serious consequences for the accuracy of 
t e figures, adjusted or not. 

7.1. Completeness of the count is an important goal, 
both for ensuring the accuracy of the census and for 
establishing the credibility of the census figures 
among all users. Adjustment should not be viewed as 
an alternative to obtaining as complete a count as 
possible through cost-effective means. Nevertheless, 
the ultimate goal is that of the accuracy of the 
published figures. Given the likelihood that the 
census will continue to produce different rates of 
undercover age for various population groups, and given 



30 

the equity problems caused thereby , we recommend that 
work proceed on the development of adjustment proce- 
dures and that adjustment be implemented if there is 
reasonable confidence that it will reduce differential 
coverage errors. 

The chapter next considers criteria for evaluating the 
numbers produced by the census (based on either unad juste 
or adjusted counts) , considering both the errors in the 
numbers themselves and the resulting loss to society due 
to erroneous treatment of political jurisdictions in term 
of representation, fund allocation, and other uses of the 
data. The panel discusses yardsticks or loss functions, 
that is, numeric measures of the impact of census errors, 
from the viewpoint of the data user and as they relate to 
adjustment. The discussion notes that no adjustment pro- 
cedure can be expected to simultaneously reduce the error 
of every piece of census information for every geographic 
area; rather, there is an important net social gain if 
differential coverage error is generally reduced. The 
panel believes it is substantially more important to 
reduce the overall error per person than the overall 
error per place and recommends that loss functions for 
measuring total error take into account the population 
size of each jurisdiction. In discussing technical 
considerations concerning choice of loss functions, the 
panel concludes that good adjustment procedures should be 
expected to perform well for a range of loss functions. 
Where the choice of adjustment procedure depends impor- 
tantly on the choice of loss function, this suggests that 
the particular adjustment procedure has weaknesses that 
need to be addressed. 

7.2. In measuring the total loss associated with an 
adjustment procedure, we recommend that the contribu- 
tion to this loss attributable to a geographic region 
should reflect its population size. Thus, we recommen< 
against loss functions based solely on the number of 
political entities losing or gaining through adjust- 
ment. 

7.3 . We believe that, in general, the results of an 
adjustment are likely to be affected more by the 
quality of coverage evaluation data and the models and 
methodology used than by the choice of loss functions. 
Given a family of loss functions with relatively simi- 
lar objectives, it should be possible, and desirable, 



31 

to determine an adjustment procedure that has good 
performance for most or all of them. We recommend 
that the Census Bureau investigate the construction of 
adjustment procedures that are robust to a reasonable 
range of loss functions. 

The next section of the chapter discusses the problem 

c estimating the likely range of error introduced by the 

] irticular procedure adopted for an adjustment. Although 

< ror can be measured only imperfectly, information about 

1 ie distribution of error is important in the same way 

1 at sampling variances for sample surveys provide useful 

: [formation. 

7.4 We recommend that the Census Bureau explore 
methods for providing estimates of errors associated 
with estimates of census over- and under cover age , with 
a view to publishing such error estimates along with 
coverage evaluation results and any adjusted census 
data that may be issued. 

Adjustment of census data could create problems of 
i ternal consistency of microdata from the census with 
gregate statistics. The panel believes that internal 
c nsistency is an important quality for general purpose 
s atistics, such as those produced by the decennial cen- 
s 3, which have a wide range of output and many uses. 
1 e section discusses reasons to prefer carrying down any 
a justment of population estimates for larger geographic 
a eas to the level of the individual micro-records and 
r fiews methods, such as weighting and imputation, for 
a 3omplishing this. 

7.5. The panel believes that it is important to 
strive for internal consistency of published census 
figures. Should adjustment appear feasible and 
effective, methods exist for distributing adjusted 
totals for aggregated groups down to subgroup values. 
We recommend that one of these methods be used to 
achieve internal consistency of census figures. 

Adjustment also presents problems of timing. Current 
1 * requires submission of state population counts for 
p 'poses of reapportionment within 9 months after Census 
D { and of small-area counts within 12 months after Census 
D. r for purposes of redistricting. The panel discussed 
t 5 pros and cons of various scenarios with regard to 



32 

release of adjusted data if it does not prove possible to 
implement a full-scale adjustment in time to satisfy the 
above constraints. Congress clearly will need to stipu- 
late which scenario is preferable for apportionment 
purposes. 

7. 60 Census data used for reapportionment and redis- 
tricting are required by law to be produced no later 
than specific dates. It is possible that adjustment of 
the 1990 census will prove feasible and effective in 
all respects, except for the ability to meet the 
required deadlines. This should not necessarily 
preclude subsequent issuance of adjusted data for other 
uses. In this situation, we recommend that the Census 
Bureau seek determination by Congress of whether it 
desires that adjusted data be used and will therefore 
extend the deadlines, or wishes to adhere to current 
deadlines and will therefore stipulate the use of 
unadjusted (or partially adjusted) data for reappor- 
tionment and redistricting. 

The remaining sections of Chapter 7 review possible 
technical approaches to the use of data from coverage 
evaluation programs for adjusting the raw census figures 
(detailed discussion of these programs is in Chapter 8) . 
The review covers procedures for starting out, that is, 
for developing estimates for a limited number of large 
geographic areas, and procedures for carrying down, that 
is, for using the large-area estimates to develop adjust- 
ments for small areas and ultimately for the microdata 
records. The discussion considers the Census Bureau's 
plans for research and testing of adjustment procedures 
in upcoming pretests and makes recommendations for 
priority research areas. 

7.7. The panel recognizes that considerable work is 
still necessary and likely to lead to improved pro- 
cedures for adjusting census data. We therefore 
support the Census Bureau's stated plans to pursue, 
internally, research and development of adjustment 
procedures, and we also recommend that the Census 
Bureau vigorously promote and support related 
statistical research in the academic community. 

7.8. The panel supports the Census Bureau in its 
plans for a 1986 pretest of adjustment operations, 
including the production of mock tabulations of 



33 

adjusted census data. We recommend analysis of the 
resulting adjusted and unadjusted data sets, to help 
identify the strengths and weaknesses of the 
particular methods tried. 

7,9, we recommend that research on adjustment include: 
(1) investigations of the assumptions underlying the 
procedures, (2) an attempt to evaluate empirically the 
more important of the assumptions as well as the sensi- 
tivity of methods to violation of assumptions, (3) 
study of methods used for carrying down estimates to 
lower levels of aggregation, and (4) a study of the 
impact of adjustment on uses of census data. 



i tapter 8 Measuring the Completeness of the 1990 Census 

1 is chapter presents the panel's recommendations for 

] search and testing to design effective coverage evalua- 

1 on programs for the 1990 census. For adjustment to be 

: lasible, evaluation programs must be good enough to 

I ovide estimates of net undercoverage that are reliable 

j >r at least large geographic areas and have error 

I operties that are broadly understood. Coverage 

( aluation programs also provide valuable information for 

i ers of the data and for the Census Bureau in planning 

s bsequent censuses. Although in general the panel 

r commends that the Census Bureau narrow its 1990 census 

i search and testing objectives, in the area of coverage 

aluation the panel believes it is too soon to focus on 

c e method to the exclusion of others. The panel makes 

s /eral recommendations related to the Census Bureau's 

c rrently preferred method of pre- or post-enumeration 

s rveys and also recommends vigorous research related to 

c ter native methods, including reverse record checks and 

s stematic observation. Appendix 8.1 discusses 

e timation methods and estimates for the illegal alien 

E pulation, and Appendix 8.2 provides estimates of 

\j riance and cost for a large systematic observer study. 

The chapter begins with a review of the problems 
a sociated with each of the major methods of coverage 
e aluation and considers the Census Bureau's current 
E ans for research and testing directed toward coverage 
e aluation of the 1990 census. The panel argues against 
t e Census Bureau's decision to concentrate on post- 
e omeration (or possibly pre-enumeration) survey method- 
o Dgy as the principal means of coverage evaluation in 



34 

1990, notes that the Census Bureau should not put itself 
in the position of lacking a means of adjustment if there 
are problems with the operation for matching survey with 
census records , and urges completion of 1980-based studie 
related to coverage evaluation. 

8.1. We recommend that the Census Bureau conduct 
research and tests of alternative coverage evaluation 
methodologies in addition to the post-enumeration 
survey, specifically reverse record checks and 
systematic observation. 

8.2. We agree that matching algorithms are very 
important to the success of several adjustment 
methods. We recommend that the Census Bureau 
investigate the development of a fallback position in 
case adequate matching is not available in 1990. 

8.3 . We recommend that the Census Bureau complete and 
report analyses of 1980-based tests related to cover- 
age evaluation/ especially the Census/CPS/IRS Match 
Study. 

The chapter next considers possible improvements and 
recommends priority research areas for each major coverage 
evaluation method in turn. The demographic analysis 
method, which uses data from independent sources includin* 
birth and death records to estimate the number of persons 
at the time of the census in a given age-race-sex cate- 
gory, currently suffers from the absence of data on 
undocumented aliens. The panel recommends research into 
using demographic analysis for estimates of the native- 
born population. The reverse record check method, which 
traces the current location of a representative sample of 
newborns, immigrants, and persons counted in the previous 
census or coverage evaluation program, has a greater 
problem in tracing in the United States because of the 
10 -year interval between censuses (as opposed to 5 years 
in Canada, where the method has been used extensively) 
The panel recommends completion of a current experiment 
to test alternative methods of tracing. The chapter 
discusses at length the method of post-enumeration (or 
pre-enumeration) surveys, in which a sample of households 
is interviewed and matched with records in the census, 
and identifies several problem areas for particular 
attention in the Census Bureau's research. 



35 

Finally, the chapter discusses the idea of using sys- 

t matic observers to provide independent estimates of the 

p pulation in a sample of areas , including but not 

] mited to areas that have proved particularly hard to 

c ant in previous censuses. This method is suggested 

fc cause it may have the potential to surmount the problem 

c served repeatedly in the history of coverage evaluation, 

r mely that persons who are missed by the census are also 

1 cely to be missed by an independent survey or other 

d ta source. 

8.4. We recommend that the Census Bureau conduct 
research into using demographic analysis to develop 
estimates of coverage for the native-born population. 
The research should consider whether these estimates 
could usefully be combined with other estimates of 
coverage. 

8.5. We recommend that the Census Bureau move quickly 
to complete the Forward Trace Study to determine the 
feasibility of using forward trace methods in a reverse 
record check program for 1990. If the methodology is 
effective, a national sample for this purpose needs to 
be initiated by 1986. 

8.6. We support the Census Bureau's research directed 
toward developing the 1990 Post-Enumeration Program 
and recommend that such research emphasize the 
following areas: 

(a) Reduction of post- enumeration survey nonresponse; 

(b) Reduction of unresolved matches between records 
for individuals listed in the post-enumeration 
survey and the decennial census; 

(e) Validation of the assumptions and/or development 
of alternative methodologies with respect to 
netting-out of overcounts and undercounts with 
reference to the place of enumeration; and 

(d) Investigation of alternatives to the assumption 
that the inclusion of individuals in the post- 
enumeration survey is unrelated to their inclusion 
in the decennial census and the estimation of the 
strength of this relation. 

8.7. We recommend that the Census Bureau initiate a 
research program on systematic observation with a view 
toward the use of this method for a sample of areas at 
the time of the 1990 census. 



36 

In the area of adjustment-related research, including 
coverage evaluation methods, the panel acknowledges that 
many technical and operational issues need to be resolved 
if adjustment procedures are to be developed in time for 
their use in the nation's bicentennial census in 1990. 
Overall, while much effort will be required, the panel is 
optimistic that substantial progress can be made. 



2 

Purposes and Uses 
of the Decennial Census 



T . e panel reviewed uses and users of decennial census 
c ta with several objects in mind. The first purpose of 
1 e review was to document major uses of the census and 
j entify their data requirements to permit the panel to 
< aluate the likely impact of changes in methodology. 
] e second purpose was to assess or at least to inquire 
: to whether some uses could not be satisfied as well or 
c most as well by other data collection programs. The 
1 ird purpose was to examine the sensitivity of each 
ii jor type of use to the accuracy of the data. 

An inescapable conclusion from our review is that, 
c yen the multiplicity of important purposes served by 
1 e census , major changes in census methodology should 
r t be made without careful consideration of their 
i mifications for a broad spectrum of uses. At the same 
t me, we believe that investigation of alternative 
c preaches to data collection might reveal opportunities 
t remove some questions from the census (particularly 
\ om the long form) or to make other changes that would 
i ee funds for efforts to improve the data that are 
c llected. Examination of the sensitivity of uses to the 
c curacy of the census is needed to understand the conse- 
c ences of census errors and to determine the benefit of 
c voting additional resources to improving the data. How 
r ch difference would it make in the distribution of 
i venue sharing dollars, for example, if the differential 
r t under count among ethnic groups in the population 
c aid be reduced from about 4 percentage points (the 
c parent difference between blacks and all others in 
] 80) to 2 percentage points? Would this improvement 
r ke more or less difference than improved measurement of 
I r capita income, which also enters into the revenue 
J aring formula? 

37 



38 

The panel's review of uses of the census stems from 
the belief that decisions on methodology for a data col- 
lection program should consider the various purposes the 
program is intended to serve. If a statistical program 
were being designed de npvo, the responsible agency migh 
go through the following steps: 

Identify the fundamental purposes the program mus 
serve and the minimal requirements for subject 
matter and geographic area detail/ needed accurac; 
of the data, and frequency of data collection 
required to satisfy these purposes; 

Identify secondary purposes that it would be 
desirable to accommodate and their data require- 
ments; 

Identify methodologies that, at a minimum, can 
serve the basic purposes; 

Further evaluate those methodologies on other 
criteria, such as ability to satisfy secondary 
purposes and public acceptability; 

Determine costs for each methodology of serving 
the basic purposes and the incremental costs of 
serving additional purposes; and 

Select the optimal methodology. 

In the case of the decennial census, with its long 
history of serving many uses and users, its unique role 
in determining political representation, and its opera- 
tional complexity, methodological choices cannot be near! 
as cut-and-dried as the above scheme would suggest, it 
is not easy to rank uses in order of importance what ma} 
be of marginal direct value to federal officials may be 
of great value for local planners or business people, an< 
it is not clear how to weight these different assessment 
Having made a decision to assign a lower or higher prior- 
ity to a given use leads to further problems of implemen 
tation. On one hand, it is hard to reconcile users to a 
decision to scale back the level of detail or accuracy 
provided or to stop serving a need altogether. On the 
other hand, it is hard to make changes to tried-and-test< 
procedures to accommodate new uses or to improve the 
level of detail or accuracy provided, even if cost were 
no particular object. 

The panel did not attempt to resolve these difficult 
questions, but undertook a more limited review of census 
uses and users with the objectives set forth at the 
outset. The chapter begins with a brief overview of the 



39 

,es of census data in the American past. Subsequent 
jctions review the distinguishing features of the modern 
nsus that shape the uses made of the data, give examples 
: major types of applications, and endeavor to draw 
[plications for census methodology from the data require- 
jnts for important uses. The chapter then reviews the 
mi ted body of research that has attempted to measure 
ie effects of census data errors on key purposes, such 

re apportionment and fund allocation. The concluding 
iction reviews research on the magnitude of errors intro- 
ced in postcensal population estimates compared with 
rors in the census itself and discusses the implications 

the research results for the utility of a mid-decade 
snsus . 



] iSTORICAL USES OF CENSUS DATA 

< iginally, the main purpose of the decennial census in 
i ie United States was to determine the population count 
: every state for apportioning seats in the House of 

] ipresentatives. Very soon, however, the census was 

< panded to collect additional information beyond basic 

< imographics, and policy makers and analysts began to use 
1 e data for many purposes. 

In the nineteenth and early twentieth centuries, census 
c ta are known to have served at least the following types 

< ' uses: 

* Scholarly analysis. For example, Frederick 

i ckson Turner's landmark work on "The Significance of 
1 e Frontier in American History" (1894) rested on 
i ialysis of census data. 

* In P ut to public policy decisions. Census results 

i rongly influenced the debate at the turn of the century 

t at culminated in the National Origins Act of 1924, which 

verely restricted immigration (Conk, 1984:10-13). A 

r ted Civil War historian has suggested that the 1860 

< nsus results were a factor leading the South to secede 
i ther than accept growing Northern population and 

1 .erefore political dominance (Nichols, 1948:460-461). 

* Use for allocation of federal funds to the states. 

1 tween 1887 and 1921, the Congress passed laws providing 
i r allocation of funds to the states for programs of 
i national education, agricultural extension, conserva- 
\ on, highways, and public health using formulas that 
: icluded census population counts. These laws laid the 



40 

foundation for the grant-in-aid system. By 1930 , the 
total funds distributed by formula amounted to about $1 
million, 3 percent of the federal budget (Conk, 
1984:18-19). 

* Public information and population analysis. From 
the beginning, Americans have been keenly interested in 
what the census results show about their own place of 
residence and how it stacks up against others. Census 
results have found their way into countless speeches, 
student themes, and newspaper and magazine articles 
describing and extolling local areas and reporting 
changes over time. 

All the historical uses of census data described abo 1 
have their counterparts today. Users of early censuses 
would be astounded by the extent and depth of analysis 
made possible by modern computer technology, but they 
would readily recognize many types of applications. 



DISTINGUISHING FEATURES OF THE MODERN CENSUS AND ITS USI 

The modern census in the United States has evolved in 
response to demands for data to serve a wide range of 
purposes, many of which are not served by any other dat< 
collection program. The need to satisfy particular kinc 
of important purposes has shaped census methodology, an< 
conversely, the distinguishing characteristics of the 
census program have created a set of expectations among 
users regarding data they look for in the census. 

This section organizes a discussion of uses of the 
present-day census according to three main features that 
together differentiate the census from other data collec 
tion programs: population counts for small areas, smalJ 
area and subgroup characteristics, and historical time 
series. Questions posed are: What is the census cur- 
rently expected to provide that other data collection 
vehicles do not? What kinds of benefits do users antici 
pate from census data as opposed to data from other 
sources? What are the implications of user expectations 
for proposed changes in methodology? Appendix 2.1 
describes applications that state and local governments- 
two important user groups make of census data. Append i 
2.2 depicts the range of uses within a single geographic 
area New Jersey among private, public, and academic 
users. It also describes the various distribution 



41 

hannels through which census data are made available to 
sers. 1 



isic Counts for Small Areas 

ie census is the source of complete head counts, includ- 
ig basic information about age, race, and sex, and of 
sidential housing counts obtained in a consistent 
inner throughout the country for small as well as large 
iographic areas. The census provides counts not only 
r the nation as a whole and for large areas such as 
jgions, states, and metropolitan areas, but also for 
unties, congressional districts, cities and towns, and 
.nor civil divisions of counties, in addition, in what 
presents a vitally important and relatively recent 
jvelopment, the census provides counts for local areas 
eluding census tracts, block groups, and city blocks. 

Local areas identified in the census are typically 
ite small in population (as are some political juris- 
.ctions, such as towns and villages see Bureau of the 
nsus, 1982b:Ch.4). Census tracts first delineated 

several large cities for the 1910 census generally 
ve between 2,500 and 8,000 residents and are currently 
entified in every metropolitan area and some nonmetro- 
litan counties. Block groups along with enumeration 
stricts covered the entire nation in 1980 (the former 
re tabulated where there were city blocks and the latter 
sewhere) and averaged about 800 population. There were 
er 2.5 million city blocks in 1980 identified in urban- 
ed areas, cities of 10,000 or more population outside 
banized areas, and in other areas that contracted with 
e Census Bureau to tabulate block statistics. By 1990, 
Dcks will be identified in all areas of the nation. All 

these types of small areas are often used as "building 
Dcks" in putting together information for nonstandard 
nsus areas, such as school districts, neighborhoods, 
lice precincts, urban renewal areas, etc. 

In contrast, the largest federal sample survey ever 
nducted, the 1976 Survey of Income and Education, 



ee U.S. House of Representatives (1982) for 
ditional documentation provided by many users from 
(rernment, private business, and academic institutions 
their needs for census data. 



42 

covered enough households (200*000) to provide reliable 
data for states and metropolitan areas but not for any 
smaller areas. Regularly recurring federal surveys , such 
as the American Housing Survey and the Health Interview 
Survey , contain just enough households (currently about 
40,000) to produce reliable information for large states 
and metropolitan areas. The Current Population Survey 
(CPS) , which includes about 60,000 households, is now 
designed to produce estimates for all states and also 
large metropolitan areas but cannot support estimates for 
smaller areas. Some localities conduct their own censuses 
(usually contracting with the Census Bureau) or surveys, 
but these efforts do not generate comparative data for 
other areas. 

Sample surveys, even the most thoroughly conducted 
ones, also do not obtain as complete a coverage of the 
population as the decennial census. The Census Bureau 
estimates that the Current Population Survey (after 
imputation for refusals and other cases of nonresponse, 
but before ratio estimation using census-based current 
population estimates) covers only 93 percent of the 
census total (Hansen, 1984:138). 

Various administrative records systems can potentially 
provide complete counts for small as well as large areas, 
but no currently existing system covers the entire 
population in a consistent manner. Among large federal 
systems, Internal Revenue Service (IRS) records, while 
covering most persons, exclude those who do not file tax 
returns or who are not listed as dependents and, in 
addition, overcount persons who both file a return and 
are reported as a dependent on someone else's return. 
Social Security administration records likewise both 
undercount, excluding children who have not yet applied 
for a card and adults who have never worked or applied 
for a card and are not yet eligible for Medicare, and 
overcount, including some decedents and persons with more 
than one social secur.ity number. Moreover, the address 
information needed to determine individuals' specific 
place of residence is not fully available from these 
sources many IRS addresses represent place of business or 
legal domicile rather than place of residence and social 
security addresses typically are current only for those 
receiving benefits (see Alvey and Scheuren, 1982). Other 
limitations of administrative records include the diffi- 
culty of generating data on families and households and 
the paucity of characteristics information. 



43 

lUnong the major uses of basic counts from the census 
ar the following: 

* ^apportionment o f the U.S. House of Representa- 
jbi es according to the distribution of population among 
jbfc states. Title 13 of the U.S. Code includes a 

pr vision requiring the Secretary of Commerce to report 
st te population totals to the President within nine 
mo ths after Census Day, i.e., by December 31 of the 
ce sus year. 

Redistricting within states and localities to 

jne t stringent court-mandated criteria for equal size and 
C pactness of election districts and for appropriate 
e resentation of race and ethnic groups. Under current 
la , the Census Bureau is to provide to the states within 
on year after Census Day a computer tape containing 
sm Ll-area population counts. The tapes provided April 
1, 1981 j contained total population plus race and Hispanic 
or jin for blocks, enumeration districts, and, where 
sp cified by the state , precincts. 

Benchmarking of postcensal population estimates . 
Ce sus counts by age, race, and sex are the starting 
po it for current population estimates produced between 
ce sus years for geographic areas ranging from the nation 
as i whole to states, counties, and all 39,000 political 
ju isdictions recognized for federal revenue sharing. 

' Calibration of data from other collection 
EL grams. Census-based current population estimates by 
ag< , sex, race, and Hispanic origin are the basis for 
we jhting the output from federal surveys such as the 
Cu; rent Population Survey and the Health Interview Survey. 

' Calculation of vital rates. Census counts and 
ce sus-based population estimates by age, race, sex, and 
ge< jraphic area serve as denominators for rates of births, 
de bhs, marriages, and divorces produced for the nation 
an< the states from the vital statistics program. 

' Alloca tion of federal and state dollars to states 
an localities. A large number of grant-in-aid programs 
in< Lude the total population as one element in the 
al ^cation formula. The best known of these programs is 
gei jral revenue sharing. 

' Determination of eligibility for funding from 
go srnment programs and of local rights and responsi- 
bl Lties. A number of grant programs have thresholds for 
el jibility; for example, the Job Training Partnership 
Ac generally designates service delivery areas as coun- 
ti 3 or cities with 200,000 or more population. Most 



states classify counties and municipalities by size and 
accord various rights and responsibilities to each size 
class. 

9 Public planning and decision making. For example, 
cities examine census counts by police precinct, school 
district, fire precinct, and many other kinds of adminis- 
trative areas built up from census geography such as city 
blocks to allocate personnel and budget in proportion to 
population and housing, to redraw administrative areas to 
equalize demands for basic services, and to serve as a 
starting point for projecting future public needs. 

* Business planning and decision making. Retailers 
locating sales outlets, for example, compare population 
density and demographic characteristics in areas sur- 
rounding possible sites. 

* Comparison and ranking of areas, such as cities 
and metropolitan centers, by population, for many pur- 
poses such as advertising, marketing, and public informa- 
tion . Even the most casual review of the nation's media 
quickly reveals the extent of reliance on census and 
census-based statistics for articles, maps, and graphs on 
national, regional, and local social and economic charac- 
teristics (see Rowe in U.S. House of Representatives, 
1982:424-428, for data on use of federal statistics in 
The New York Times) . 

As noted at the outset of this chapter, the panel 
believes that the uses of census data should be examined 
periodically, by the Census Bureau and others, to reassess 
their importance and the possibilities for meeting them 
from alternative sources. Some uses of basic small-area 
counts may appear unimportant or even frivolous and not 
worth expenditure of public funds. However, other uses 
are fundamental to our federal governance (including 
re apportionment and redistricting) or to the efficient 
delivery of goods and services in the public and private 
sectors, and demonstrate why basic small-area counts 
constitute the heart of the census program. 

Most of the important uses of basic census population 
(and housing) counts cited impose the requirements that 
data be collected in a complete and comparable way for 
all manner and size of geographic areas, with consequent 
implications for proposed changes in methodology. The 
requirement for comparable data across areas strongly 
implies the need to obtain estimates of the population 
more or less at a point in time. The requirement for 
comparable data argues as well for the need to standardize 



processes of data treatment and estimation to the extent 
practicable. The fact that users expect to be able to 
obtain counts for very small areas, such as blocks and 
tracts, and to use these counts to reaggregate the data 
into other kinds of areas such as school districts or 
police precincts, implies the need to incorporate any 
estimation or imputations used into the microdata records 
so that consistent totals can be produced for whatever 
tabulations are requested. 

It is of course possible to challenge these arguments 
or to state that other considerations must take prece- 
dence. However, the panel in subsequent chapters evalu- 
ating proposed changes in methodology justifies the 
premise that comparability and consistency of census 
figures are requirements that methodological innovations 
should satisfy unless there are compelling reasons not to 
do so. 

With regard to the requirement for completeness or 
accuracy of basic census counts, the picture is somewhat 
less clear. Ideally, every user would like a completely 
accurate set of numbers, but it is recognized that it is 
impossible to obtain a perfectly complete count. The 
question becomes the tolerable level of accuracy for a 
particular application. For many uses of the basic 
counts, such as allocating police personnel in proportion 
to neighborhood population, the level of accuracy cur- 
rently embedded in the numbers is probably quite accept- 
able. With regard to re apportionment, there is evidence, 
discussed below, that the differential errors in the 1980 
census counts may have affected the allocation of one or 
two seats, a matter of some concern to the states 
involved. 

Evaluation of the need for increased accuracy of 
census counts for uses in fund allocation formulas is 
difficult. Most formulas include other factors besides 
the population counts. The available limited evidence on 
the effect of errors in the basic counts on equity in 
fund allocation is reviewed in a separate section below. 
For many programs that allocate predetermined amounts of 
public monies, and for other key uses such as determina- 
tion of political representation, it is the differential 
errors among population subgroups and geographic areas 
that cause the most serious concern. Differential errors 
in coverage of basic age, race, and sex groups also have 
implications, discussed below, for postcensal population 
estimates and for important series, such as vital statis- 
tics that use census figures in the denominator, with 



w 

an area on the wrong side of a T*"" ""' Pla es 
and administrative practices ofSn ^ Legislation 
appeal for areas that believe 4ey h^vl aVe " UeS f 
cross a threshold or not dectiZf 9rOW " enou 9 h to 



eve e 

cross a threshold or not dectiZ 
threshold even though 
(see appendixes) . ? 

there is ^e evidenceh'at erors 
for important purposes and that 
promise to reduce 
should be given 



dr P below a 



"*"" 
. oounts "atter 

1SS Showln 9 
ln Cha Pter 7, 



Small-Area and Subgroup Characteristi 



cs 






r s basi b .- 

for many other chartcteristics on ^ detalled data 
small as well as -larae !,. - <=Parable basis for 
population. The 1980 cen^ ^ ,** SUb 9 rou PS f the 
Population and 30 housinrTte^ dSd . " total ovsr 3 
topics, most items-26 of tte Lpula - n9 ' br ad ran ' e of 

SSTJTS^S: a f z 

computer files as weTfa's %?'S?* aOtm ' Which 
tabulate these items in a varietv ^ Uments ' cr s - 
protect the oonf identiai ft-7, < : ^ Ways< ln order to 






as 



f 



variety of 
Population 



out a 



data 






blocks and census socle-economic data on the 
population and housing of each area; 

Identifying the most "disadvantaged" areas in 
a city for locating service facilities; 

Conducting traffic planning studies related to 
peak loadings and based on cross-tabulated 
information on place of work and place of 
residence; 

Identifying concentrations of groups that are 
targets or potential targets of government 
programs (poor elderly persons living alone, 
youth without previous work experience, 
work-disabled persons) ; 

Allocating funds to states and localities by 
means of formulas {for example, age of housing 
is a factor in one formula used for community 
development block grants and children in 
poverty is a factor in the formula for some 
educational assistance programs) ; and 

Redesigning major statistical programs, such 
as the Current Population Survey. 

Business planning and decision making, including: 

Locating retail outlets in terms of market 
potential based on area socioeconomic 
characteristics, such as income, occupation, 
education, home ownership, and housing value; 

Comparing the market potential of different 
cities, ZIP codes, or census tracts within 
cities; and 

Assessing the availability of needed occupa- 
tional skills in different labor market areas. 

Basic and applied socioeconomic research, 
including: 

Analyzing groups that represent reaggrega- 
tions, for example, persons in high-tech 
industries aggregated from detailed industry 
breakdowns; 

Issue-oriented analyses, for example, study of 
the assimilation of different immigrant groups 
or projections of shortages in selected 
occupations; 

Analysis and legal testimony related to 
affirmative action and equal employment 
opportunity programs and challenges to the 
representativeness of juries; and 



Analysis of relationships, for example, 

characteristics of persons who moved during 
the past 5 years, characteristics of families 
with adult children living at home. 

Although questions have been raised about the neces- 
sity of having the census collect all the characteristics 
data, the census does not by any means collect every kind 
of item that business leaders, government officials, and 
researchers might want. This is true even though the 
marginal cost of additional questions is low relative to 
the large fixed cost of obtaining the count and basic 
demographic information. Moreover, the items that are 
collected are not all obtained from every household or 
tabulated for every area. Over the decades, budget and 
operational constraints, demands for privacy, and con- 
siderations of the burden on the public have led the 
Census Bureau to a methodology that imposes the following 
kinds of restrictions on the data collected and tabulated: 

(1) The Census Bureau carefully reviews proposed 
items to be sure that the need for them justifies 
expenditure of public tax dollars. While the data are 
useful for many marketing and business planning purposes, 
the Census Bureau will not include questions solely for 
such purposes. For example, questions on number of pets 
are proposed and turned down virtually every decade. 
Similarly, questions that were asked in censuses through 
1970 on appliances, such as clothes dryer and TV, were 
eliminated from the 1980 census, in prior decades, these 
items were justified for analysis of changes in standard 
of living in different areas of the country, but they no 
longer are. 

(2) The Census Bureau also carefully reviews items to 
determine whether they are needed at the block level and 
hence must be included on the short form administered to 
every household; whether tabulations for somewhat larger 
areas are sufficient so that the item should be included 
on one or more versions of the long form sent to only a 
sample of households; or whether some other vehicle (such 
as the CPS) could provide adequate geographic detail. 

(3) Question detail is limited to what it is judged 
self-respondents can handle in a reasonable amount of 
time and with a minimum of confusion. For example, the 
income question in the census specifies fewer categories 
than the corresponding questions in the Current Population 
Survey and many fewer categories than the corresponding 



questions in the Survey of Income and Program Participa- 
tion. (The SIPP is administered in person by inter- 
viewers and the CPS in person or over the telephone.) 

(4) What is asked of every household is limited to 
what will fit on two facing pages? the number of items 
asked of a sample of households is limited to what will 
fit on two additional pages per person plus a page and a 
half of housing items. Forms are designed and most 
questions formulated for machine tabulation. 

(5) Cross-tabulations are limited for smaller areas 
in order to protect the confidentiality of replies and 
prevent identification of individuals. 

The decennial census does not cover as many subjects 
or cover specific subjects in as great depth as many 
surveys, but it provides many more analytically relevant 
explanatory variables than most administrative records 
systems. The detail it provides can be cross- tabulated 
in a multiplicity of ways without adversely affecting 
reliability or raising confidentiality considerations. 
rhe census is virtually the only source of detailed 
2omparable characteristics as well as totals for small 
areas and small groups in the population. 

There are several important implications for method- 
Dlogy stemming from these distinguishing features of the 
census. Many of these are the same as for the basic 
counts: for comparative analysis the need to obtain a 
reading more or less at a point in time, the need to 
standardize all processes of data treatment and estima- 
tion, and the need for consistency across various 
tabulations and retrievals whether planned or ad hoc. 
Finally, there is limited evidence that the relatively 
small errors in the census population count, though 
possibly significant for analysis of certain very 
specific subgroups, are rarely significant for most 
sross-sectional uses of characteristics data. For these 
uses, the improvement or adjustment of counts is less 
significant than the reduction of errors and biases in 
content (e.g., misreporting of marital status by single 
nothers, miscoding of occupation and industry, errors in 
income reporting, etc.). As an example, evidence, dis- 
cussed below, indicates that errors in the income compo- 
nent of the revenue sharing formula have more impact on 
Eund distribution than errors in the population component 



Historical Time Series 



Population gros 

changes over time. (Since to 

follow individuals from census 

micro-level data do not exist 

uses referenced above, ranging fr 

market research to acVol"5 an i ^ Iocal P^^i 

nificance when they are carried Y ? 1S ' galn added 

on a comparable bas'is For e^pTe *T CW1 ' US t 

interest not only in the levef of waa^? is kee ^ 

to men and of blacks relative to JTT W men rela tive 

changes in the relative levl?, Whltes ' b ^ also in 

and r 



for 

analysis of 
BUreaU does n t 
' lon ^tudinal 

the kinds of 
Iocal P^^i to 



oft he census has additional 
ey nd the 






- 






tion counts, from census to cen < enSUS P pula 

subgroups, is disturbing A f ^ am ng PP u lation 
of a few percentage poTnt's f ot -^' ^ hange in cove " 
Population can introduce considerL^ ^ r His P^ic 
comparisons of growth rates forv unce ^ainty into 

Population. MoLover, because cIn^ US Se ^nts of the 
calibrate postcensal ^nuT^ Censuses are used to 
as population controiffor nationfl mat6S ' Whlch are u 
CPS, changes in coverage create df ^^^ SUCh as the 
series that are difficult to intern^f lnUitieS in time 
important are the uses of " D u la f P ^ *>**** more 



and white o expe 
death rates for .^i c 'S2'J."* Ctl 



Average 



of 



One consideration regarding periodicity of the census 
is whether 10 years is the optimal interval. A decennial 
census is mandated in the Constitution for reapportion- 
ment. The Congress passed legislation requiring a raid- 
decade census in 1985 and every year ending in 5 there- 
after; however, funds were never appropriated to carry 
out the 1985 mid-decade census, it appears to be the 
case that errors in postcensal estimates dwarf errors in 
the census itself (see further discussion below) and, 
therefore, depending on the cost, it might be cost- 
effective to conduct a mid-decade census to improve the 
data for purposes such as fund allocation. 



EFFECTS OF CENSUS ERROR ON KEY USES: REVIEW OF RESEARCH 

This section reviews extant research that attempts to 
measure the impact of errors in the census on important 
uses of the data. Typically, this is done by implement- 
ing one or more types of "adjustment" of the census 
figures and comparing results of using the pre- and 
postad justed data set. Considering the political and 
economic power that flows from the census through reappor- 
tionment, redistricting, and fund allocation, as well as 
the concern over possible inequities in the distribution 
af power resulting from coverage and other errors, there 
has been relatively little research on what difference 
census errors make for the allocation of votes and funds. 
Moreover, the research studies reported in the literature 
and reviewed below are subject to limitations in scope 
and method, so that their findings must be viewed with 
caution. 

The focus on research directed to re apportionment, 
redistricting, and fund allocation is not meant to sug- 
gest that these are the only important uses of census 
3ata or that errors may not be a problem for other uses. 
As should be evident from the previous discussion, census 
3ata are used for a wide range of research, planning, and 
public policy purposes. However, virtually no research 
las been carried out on the effects of census errors for 
purposes other than allocation. Keyfitz (1980) has 
expressed the opinion that a considerable margin of error 
is tolerable for most research and planning purposes. 
Dthers have argued that, as an example, the use of census 
data for establishing and monitoring equal employment 
opportunity programs places requirements for accurate 
coding of occupation for age, race, and sex groups in 



small geographic areas that the census currently does not 
meet (see Conk, 1981) . 

Before proceeding to review the published studies on 
effects of census errors for reapportionment, redistrict- 
ing, and fund allocation, we should be clear about what 
is encompassed in the term "census errors" and what is 
not. There are many kinds of error in collected data. 
In the census context, errors include: 

Coverage error (households/persons omitted; 
households/persons erroneously included) ; 

Unit nonresponse (households/persons known or 
believed to exist but lacking forms) ? 

Item nonresponse (households/persons with one or 
more items blank) ; and 

Misresponse (for example, underreporting or 
overreporting of income) . 

In addition, the data become less useful the longer the 
time interval between collection and release. 

Current census methodology includes procedures to 
attempt to correct for some of the sources of error noted 
above, specifically, unit nonresponse and item non- 
response. However, these procedures are never completely 
accurate and may introduce added error. Most research 
studies have focused on coverage errors in the census; a 
few have also looked at the interaction of coverage error 
and misresponse for selected items. Users, such as 
government and business planners, when asked, have often 
noted that delays in release of census tabulations much 
more adversely affect their use of the data than do 
coverage or content errors. 1 

What none of the research covers and what the discus- 
sion in this chapter does not attempt to address are 
considerations of "error" in the larger sense. That is, 
even if a data set were completely accurate, it could 
well be the case that the application of the data in a 
formula resulted in an inequitable allocation because the 



2 Based on notes of Constance F. Citro from the session 
on census undercount, annual meeting of the Association 
of Public Data Users, October 25-26, 1984, Washington, 
D.C. 



variables did not in fact relate to the intended purpose 
(see Keyfitz, 1980) . I 

Finally, the panel recognizes the very difficult prob- j 

lems in attempting to assess the implications for census I 

methodology, particularly with regard to adjustment, of J 

research findings about effects of census errors. Such | 

an assessment rests first on one's judgment about the * 

quality of the research, specifically as to: (1) the t 

accuracy and completeness of the estimates of errors in \ 

the census applied in each study and (2) the complete- 
ness and appropriateness of the methodology used for t * 
evaluating the effects of applying a certain set of ,] 
correction factors. (For example, need a study of general ,' 
revenue sharing replicate all aspects of the complicated f 
formulas to assess adequately the implications of 
estimated census errors?) 

Assuming that the research results appear creditable, , 

one must further make a judgment as to whether the mea- 
sured effects of census errors are sufficiently important '<* 
to warrant adjustment, particularly given that any adjust- ! 
ment procedure may itself add error and given the cost 
associated with developing adjustment procedures and , 
gathering the input data for those procedures, in other ' 
words, granting that a data set can never be completely 
accurate, one must decide what constitutes sufficient 
accuracy for particular uses and whether adjustments that 
can be made represent sufficiently significant improve- * 
ments. Is it tolerable, for example, to have two con- 
gressional seats misapportioned because of coverage errors j? 
in the census? Four seats? Six seats? Is it tolerable 
if research suggests that coverage errors do not affect 
apportionment, but coverage and content errors result in 
the states receiving, on average r about 1 percent more or 
less revenue sharing funds than they should? Two percent 
more or less? Four percent more or less? is it tolerable ' 
if areas with high proportions of blacks who are more # 
poorly counted than whites receive less in federal funds ( 
from all the population-based formula programs than they ;< 
should? t 

Ultimately, these are political judgments. The panel '' 

has concluded from the research discussed below and so * 

stated earlier in this chapter its belief that errors in % 

the census do make a difference for important purposes. ^ 

Throughout the remainder of the report, the panel supports ^ 
research and testing of methods that show potential to * 

reduce errors within reasonable cost limits. The panel's >f 

T? 

recommendations are directed both to methods for coverage ^ 



Effects of Errnrc 

r Errors on Reapportionment 

of 



Qo o 

congressional seats aLng ^3^ ** the alloc ^ion of 
developed several sets ol ~*. t ? tes The study f irst 
for states, using the L tlmates of net undercoverage 
(see Chapter 7) to carry dow^ 6 ?* Synthetic estirnation 
undercount for major poLSin ^ estim ^es of 
estimates included: PUlatlon S^oups. The different 



te ' other) 

net 



net unteo te'"^ the natton ^ "te 

spanish 



Each of these sei-<s f 

graphic distributions "2^11"***** sil " tlar 
states having net under^t r " T* " teS ' " lth nine 
with the n-tS" 



n- 

adait l0 nal sets were C rod,,~J ' 5 P eroen t. Three 

second scenario as follows! " m aificati of the 

P. below the poverty 



r. y 

f persons not in vty' *""* th6 net deroount rate 

for each^a^e* gX^rTei i"" **' " et -dercount rate 



state. ompeted for that group within 



greater, whe ,. ?";* tes "f 3 percent or 
3 Percent or greate^ u*"''"-"' rates of 



Correcting the state populations using the net under- i 

:ount estimates developed under each scenario and running j J 

he corrected figures through the currently used appor- | 

:ionment formula a method called "equal proportions" ] 

ave the following results. Only the fifth and sixth , , 

icenarios changed the apportionment from that using the i 

madjusted census figures. Under scenario (5), Alabama 

[ained one seat and California lost one seat, while, under ' 

cenario (6), Alabama gained one seat at the expense of '' 

Jklahoma. ' ' 

Carlucci (1980) noted that a subsequent Census Bureau \ 

tudy (Siegel et al., 1977) that developed alternative / ' 

istimates of net undercover age in the 1970 census for ; 

tates showed a greater impact of coverage errors on 

ipportionment. Adjusting the census figures with one set -, 

>f estimates developed in this later study produced a I 

ihange of one seat between Tennessee and Oklahoma, while j 

he use of another set produced changes of two seats ' , 

.nvolving California, Texas, Ohio, and Oklahoma. ' | 

Kadane (1984) , in a study of the consequences of * 

leverage errors in the 1980 census for reapportionment, 

leveloped estimates of the population by state based on j 

:he results of the 1980 Post-Enumeration Program, ( 

pecifically the PEP series 2-9 estimates (see Chapter 4 

:or a description of PEP). Application of Kadane 's t 

stimates for reapportionment gave California an llt j 

idditional seat at the expense of Pennsylvania. 

Finally, a simulation study performed by Gilford 
1983) of congressional apportionment based on different 
ets of state population estimates from the PEP showed 
.hat the results are sensitive to the estimates used, 
ilford contends that the PEP results should not be used 
:or adjustment because (p.2): "adjustment of state 
opulation counts can cause counter-intuitive changes in 
ipportionment" and "the extreme volatility of appor tion- 
ient results based upon adjusted census counts 
ittributable solely to the random characteristics of the 
^articular PEP sample selected renders the PEP unsuitable 
is a basis for adjusting the census for apportionment 
>urposes." It should be noted that the counterintuitive 
:hanges reported by Gilford are largely the result of the 
act that states are not allocated fractional representa- 
:ion. As Siegel (1975:13-14) commented, "Under the 
lethod . . . used to determine the number of Congressmen 
:rom each State, the shift in the population of a State 
equired to produce a change in the State's representa- 
:ion, may be merely a few hundred persons or a few hundred 



tnousana persons, depending on the precise populations of 
all the States." Moreover, Gilford used all 12 sets of 
PEP estimates, some of which are regarded as less 
plausible than others. There remain the conclusions 
that: (1) coverage errors in the 1970 and 1980 censuses 
affected at least one or two congressional seats and (2) 
considerable uncertainty remains as to the particular 
states (both winners and losers) that might have been 
affected. However, with respect to point (2), in 
Gilford's analysis for 1980, California gained at least 
one congressional seat under every scenario explored. 



Effects of Errors on Redistricting 

Determining the boundaries of congressional election dis- 
tricts, as well as districts for state legislative 
offices, is a state function. Prior to the "one man, one 
vote" Supreme Court decisions in the early 1960s (Reynolds 
v. Simms and Baker v. Carr) , the states redistricted when 
absolutely necessary because reapportionment changed the 
number of seats and on occasion when the party in power 
believed it would be advantageous politically. States 
were notorious in allowing districts to vary greatly in 
population size. The Supreme Court decisions mandated 
strict requirements for population equality (no greater 
than a 1 percent difference in population between the 
largest and smallest congressional district and no greater 
than a 10 percent difference among state legislative 
districts see Carlucci, 1980) as well as compactness and 
contiguity of districts. In addition, the Voting Rights 
Act and growing awareness by minority groups of the 
effect of the composition of election districts on their 
political strength led to demands backed up by court 
actions for equal representation of important population 
groups. Hence, today, in addition to redistricting due 
to reapportionment (Bureau of the Census, no date-b:l): 

States and localities are forced to redistrict 
because of challenges brought in court or because 
the Justice Department clearance mandated by the 
Voting Rights Act fails to occur. Between 1967 
and 1978, some two dozen cases concerned with 
state or congressional redistricting went to the 
Supreme Court. 



Census small-area data are critical for the modern 
redistricting process to meet the standards set by the 
courts. Currently, P.L. 94-171 requires that the Census 
Bureau transmit small-area population data to each state 
within one year after Census Day. To permit achieving 
equal population size among districts, the 1980 census 
P.L. 94-171 computer tapes provided to the states by 
April 1981 tabulated the population for each city block 
or enumeration district in unblocked areas. Although not 
required by law, the Census Bureau added race and Hispanic 
origin data to the tapes, as it was clear that states 
would need these data to justify their plans to the 
Justice Department and to survive court challenges. The 
states are currently requesting that the Census Bureau 
provide P.L. 94-171 data in 1991 with the addition of 
separate counts of the voting-age population. 

It is possible that differential coverage errors among 
population groups and areas could affect the degree of 
population equality actually achieved by a redistricting 
plan based on the decennial census data. The study by 
Siegel (1975), mentioned above, assessed the likely 
effects of coverage errors in the 1970 census on the 
composition of districts within a state or city. Given a 
predetermined number of seats, Siegel noted that a new 
legislative district must be carved out of existing dis- 
tricts, one of which must be eliminated. The possibility 
that adjustment of the census counts would result in an 
additional seat going to an area within a state or city 
at the expense of another area depends on the average 
size of the districts, the differential coverage rates of 
major population groups, the proportionate distribution 
among areas of the major groups, and the number of con- 
tiguous districts with high undercoverage rates. Siegel 's 
analysis, assuming different rates of coverage of the 
white and black populations and different proportions of 
whites and blacks within areas, indicates that the 
possibility of a shift in the number of congressional 
districts for the regions of a state was very small. 
Siegel estimated that the chances of a shift in state 
legislative districts or city council districts were 
somewhat greater, but still small. 

Carlucci (1980) applied Siegel 's calculations to New 
York City, as an example, and judged that an adjustment 
for undercount in 1970 would not produce additional 
representation for that city. However, Carlucci pointed 
out that, if one made the additional assumption that all 
groups other than whites were undercounted at the same 



"70 



tr i=ting 



to 



Uld 



s 



cheoki "9 



iess -- 



. 

Bureau anoth *. ft numbers 



Wr 




I 

] 

c 

1 

fc 





P 
1 



but also 



59 





distcicts that differ 
H " ever ' ik "P^rs 

ften *" 
th6 dif =lt 
Ie 9 lsl "ure and 

C Urt Chal ~ 
of state 

nsus 



n their rate of 

hat states do non to r 

ice after each deoennUl oe ns * 

getting any plan approved by 

"rough Justice D 

="? in fact, ers of 

^gislatures stated that it 

thorized by P.L! 9^.52! " 

ovide small-area stetisti " Unded) " ere to 

^ seek to have the lat c^ e f 2 W Uld 9 to c "9s 

ch data for redrawinTstat. ! to Pclude the use of 

:eau of the Censnc ^ ""Sessional districts see 
n otl ng 

the 3r are 

"-an, oneote. ni e rh^T 6 "* Ot n the 
:roduced by errors Y SCf3 y efects 

-o disadvantage 



ects of Errors on Fund Allocation 



state and local assistance 



were 



I to states and 






budget 



- 1980). AS of fiscal 
la Bunted to ^^ 



TABLE 2. 1 Uses of Census Data in Selected Federal Grant-in- Aid Programs 



Program 



Data Items Used for Allocation 
and/ or Eligibility 



Fiscal 1984 

Expenditures 

(billions) 



Education 

Adult Education Act 
(P.L. 89-750) 



Career Education Incentive Act 
(P.L. 95-207) 

Education Consolidation and Im- 
provement Act 
(P.L. 97-35) 



Head Start 



Higher Education Act 
(P.L. 89-329) 



Public Libraries 

(P.L. 84-597) 
Vocational Education Act of 1 963 

(P.L. 94-482) 



Employment and Training 
Job Training Partnership Act 
(P.L. 97-300) 



Title III: allocates same base amount to $0.8 
all states and then remainder based on (includes 
state share of persons age 1 6 and older career and 
with less than 4 years of high school vocational 
completed (excluding persons ages education) 
16-19 currently enrolled)." 

Allocates funds to states based on state See adult educa- 
share of population ages 5-18.* tion 

Chapter 1 , Educationally Deprived Chil- $3 . 4 
dren: allocates funds to school dis- 
tricts based on district share of chil- 
dren ages 5-17 in poverty. 

Chapter 2 , Consolidation of Federal Pro- $0.4 
grams for Elementary and Secondary 
Education: allocates funds to states 
based on share of children ages 5-17. b 

Allocates 87% of available funds to $1.0 
states based on state share of children 
under 1 8 in AFDC families and chil- 
dren under 6 in families in poverty." 

Title IV-C, Work Study Program: allo- $0.5 
cates 90% of funds to states as follows: 
'/3 based on state share of persons 
enrolled full time in postsecondary 
schools; i/a based on state share of 
high school graduates; '/3 based on 
state share of related children under 1 8 
in families with income under 
$3,000." 

Allocates funds to states based on share N.A. 
of population.* 

Title I, Part A, State Vocational Educa- See adult educa- 
tion Programs: allocates 50% of funds tion 
to states based on share of population 
ages 15-19; 20% based on share of 
population ages 20-24; and 15% 
based on share of population ages 
25-65.* 



Eligible Service Delivery Areas (SDAs) $3.0 
must be one or more counties or cities (all pro- 
of 200,000 or more current population grams) 
(or a rural CETA prime sponsor or an 
exception approved by the governor).* 

Title II, Part A, Adult and Youth Pro- 
grams: allocates 1/3 of funds to states 
based on state share of unemployed in 
areas of substantial unemployment; !/3 



reneral Revenue Sharing 

State and Local Fiscal Assistance 

Act of 1972 as amended, 

Title I) 



lousing 

Community Development Block 
Grants (Housing and Com- 
munity Development Act of 
1974 as amended, Title 1) 



based on state share of excess unem- 
ployed; '/3 based on state share of eco- 
nomically disadvantaged population. 
Uses same formula to allocate 78% of 
each state's funds to SDAs. c 

Title II, Part B, Summer Youth Employ- 
ment and Training: uses Part A for- 
mulas. 

Title III, Dislocated Workers: allocates 
'/a of funds to states based on state 
share of unemployed; ] /3 based on state 
share of excess unemployed; and !/3 
based on state share of persons unem- 
ployed 15 or more weeks. State- 
required matching percentage is re- 
duced 10% for each 1% higher than 
average unemployment in previous 
year. c 



Allocates funds to states according to 1 $4.6 
of 2 formulas: 

(1) Allocates funds based on state 
share of total population times tax 
effort (state and local taxes divided by 
personal income) times ratio of state 
to national per capita incomeA** 

(2) Allocates % of funds based on 
state share of total population; % 
based on state share of population 
divided by per capita income; % based 
on state share of urbanized popula- 
tion; !/6 based on state share of tax 
effort; and !/6 based on state share of 
state income taxes. b - ti 

Allocates 100% of state funds to general 
units of government in each state 
based on local-government-unit share 
of total population for units in the state 
times tax effort times the ratio of gov- 
ernment unit to state per capita in- 
come. b > d 



Eligible areas are cities with 50,000 or $4.4 
more population, metropolitan count- (includes Dr- 
ies with 200,000 or more population, ban Develop- 
and some nonmetropolitan areas. b ment Action 

Allocates 80% of funds to cities and Grants) 
counties according to 1 of 2 formulas: 



TABLE 2.1 (Continued) 



Program 



Data Items Used for Allocation 
and/or Eligibility 



Fiscal 1984 

Expenditures 

(billions) 



Federal Housing Act of 1949 as 
amended 



Public Assistance 

Aid to Families with Dependent 
Children (AFDC Social Se- 
curity Act, Title IV) 

Low Income Home Energy As- 
sistance 
(P.L. 97-35) 

Medicaid 

(Social Security Act, Title 
XIX) 



( 1 ) Allocates 14 of funds based on area 
share of total population for all eligi- 
ble areas; !/2 based on area share of 
persons in poverty; !/4 based on area 
share of overcrowded dwelling units 
with more than 1.01 persons per 
room. '* 

(2) Allocates Vs of funds based on area 
share of the growth lag for all areas; 
3 /m based on area share of persons in 
poverty; and '/z based on area share of 
older housing built before 1940. a < b 

Title 502, Housing Assistance Pro- $3.3 
grams: 

Eligible areas to receive insured and/ 
or guaranteed loans include nonmet- 
ropolitan areas with under 10,000 
population and areas between 10,000 
and 20,000 that face credit-shortages.* 

Allocates 3 /io of funds for insured loans 
to states based on state share of rural 
population living in inadequate hous- 
ing; 3 /io based on state share of total 
population; 3 /io based on state share of 
rural population in poverty; '/io based 
on state share of per capita housing 
cost. -* 

States distribute funds for insured loans 
to districts (groups of counties) and 
from districts to counties using same 
formula. 

Funds for guaranteed loans are dis- 
tributed using similar formulas, 
except that the share of rural house- 
holds with incomes between $15,000 
and $20,000 replaces the rural poverty 
factor. 



Determines federal matching percentage $6.6 
of state expenditures based on state 
share of 3-year average per capita 
income. d 

Allocates funds to states based on house- $1.9 
holds below the lower living standard 
income level and below 125% of pov- 
erty. 

Determines federal matching percentage $19.2 
of state expenditures based on state 
share of 3-year average per capita 
income/ 



rABLE2.1 (Continued) 



'rogram 



Data Items Used for Allocation 
and/or Eligibility 



Fiscal 1984 

Expenditures 

(billions) 



9 ublic Works 

Construction Grants for Waste- 
water Treatment Works 



Recreation 

Jrban Park and Recreation Re- 
covery Program 
(P.L. 95-625) 



Social Services 

Community Services Block 

Grants 

(P.L. 97-35) 
Dlder Americans Act of 1965 as 

amended 

(P.L. 89-73) 



Runaway Youth Act 
(P.L. 96-509) 



Allocates !/2 of funds to states based on $2.5 
formula A and '/2 based on formula B: 

(A) Allocates ] /4 of funds based on 
state share of total population and 
remainder based on state share of need 
(based on construction costs and pop- 
ulation projections).* 

(B) Allocates funds based on max- 
imum of state share of total population 
and state share of needs.* 



Title X: $0.03 

Eligible areas for funds are central cit- (est. ) 
ies of metropolitan areas, places of 
40,000 or more population, and 
counties of 250,000 or more popula- 
tion that score above the median on a 
composite variable including popula- 
tion density, net change in per capita 
income, percentage of unemployed, 
percentage of households with cars, 
population under 18 and 60 and older, 
and percentage of population in pov- 
erty. '* 



Allocates funds to states based on state $0.4 
share of population in poverty." 

States must submit a plan for services $0.7 
and designate Planning and Service 
Areas (PSAs), generally as counties or 
groups of counties based on total per- 
sons 60 and older and low-income per- 
sons 60 and older. 

Outreach required of PSAs with large 
numbers of persons 60 and older with 
limited English ability. 

Title III, Parts B and C, Supportive Ser- 
vices and Senior Centers and Nutri- 
tion Service: allocates funds based on 
population 60 and older.* 

Some states allocate funds to PSAs 
based on each PSA's share of persons 
60 and older below poverty." 

Allocates funds to states based on N.A. 
number of children under 19.* 



TABLE 2.1 (Continued) 



Program 



Data Items Used for Allocation 
and/or Eligibility 



Fiscal 1984 

Expenditures 

(billions) 



Social Services Block Grants 
(Title XX of Social Security 
Act) 

Transportation 

Highway Research, Planning and 

Construction 

(Title 23, U.S. Code) 



Urban Mass Transportation Act 
(modified by 1982 Surface 
Transportation Assistance Act) 



Allocates funds to states based on state 
share of total population.* 



Primary Systems Program: allocates % 
of funds to states based on state share 
of total land area; % based on state 
share of rural population (including 
places under 5,000 outside urbanized 
areas); % based on state share of mail 
delivery route mileage; and '/3 based 
on urban population. 

High-Hazard Locations Program: allo- 
cates % of funds to states based on 
state share of total population and 1 A 
based on state share of public road 
mileage. 6 

Section 5: provides funds for approved 
projects of metropolitan transporta- 
tion agencies. Surface transportation 
entitlement is determined for an 
urbanized area based on average of its 
share of total urbanized area popula- 
tion and its share of population den- 
sity." 



$11.2 
(all pro- 
grams) 



$3.9 



NOTE: Except for AFDC and Medicaid, the programs distribute shares of a fixed amount of funds. 
This is either because the allocation formula is explicitly share-based or because the amounts 
allocated are proportionately reduced to fit within an appropriations ceiling for the fiscal year. The 
allocation formula descriptions in the table omit many features affecting fund distribution such as 
hold-harmless and minimum and maximum amount provisions. 

N.A. = Not available. 

"Decennial census data are the only reliable source for some or all formula elements. 

''Can use census-based current population estimates and/or Current Population Survey (CPS) data 

for some or all formula elements. CPS data are controlled to census-based current population 

estimates. 

c Data source is Bureau of Labor Statistics (BLS) local area unemployment estimates. These are 

calibrated to the CPS, which is calibrated to census-based current population estimates. 

''Data source is Bureau of Economic Analysis (BEA) per capita income estimates, based on BEA 

personal income estimates divided by census-based current population estimates. 

SOURCES: Bryce (1980); Emery et al. (1980); Gonzalez (1980); Herriot (1984: various 
unpublished attached documents such as copies of laws provided by federal agencies); Maurice and 
Nathan (1982); Office of Management and Budget (1985). 



data elements used in fund distribution. (See Appendix 
2.1 for a description of some state grant programs.) 

There is no requirement for federal funds to be allo- 
cated according to formula comparable to the constitu- 
tional mandate for re apportionment or court requirements 
for redistricting. Indeed, the present administration 
has worked to reduce the scope and extent of both formula- 
based and categorical federal grant- in-aid programs. 
However, it is clear that Congress has become accustomed 
to use formulas that eliminate the need for case-by-case 
decisions regarding fund applications from states and 
localities. It is likely that some formula-based programs 
will continue and that new programs of state and local 
assistance will in many cases be formula-based. Hence, 
research on the effects of errors in the census on the 
distribution of federal funds should contribute impor- 
tantly to the making of sound choices for decennial 
census methodology. 

Unfortunately, the available research to date in this 
area is severely limited both in method and in scope. No 
research has been completed that looks at the total set 
of grant programs; most research has concentrated on one 
program general revenue sharing. Several factors have 
prevented a comprehensive analysis, including the lack of 
documentation of the formulas used and the complexity of 
many formulas. Emery et al. (1980:74,77) in a study 
attempting to document all of the formulas in use, noted: 

Among the formula grant programs . . . one-fourth 
failed to report the existence of a formula to OMB 
while others reported the existence of a formula 
but did not specify the factors involved. The 
lack of central documentation and the variability 
in agency documentation cause a large part of the 
uncertainty concerning how statistics affect 
assistance payments. . . . Notwithstanding the 
considerable vested interest and controversy 
surrounding the topic, the total number of 
programs having statistical allocations, the 
amount of money involved and the quality of data 
employed in calculating payments are unknown 
quantities. 

With regard to complexity of the formulas, the same 
authors note (p. 77) that "the simplest allocation 
formulas involve a calculation of a State's share of 
iollars based on the State's share of the total U.S. 
population. However, most allocation formulas are far 



Two key aspects of fund an o ? maXlmUm awa ^s. 
extent to which errors 7n the cen ^ f rmUlaS affect 
Distribution of gran t progrt Monies 'T" *" ine ^a 
program distributes funds 0^^ FlrSt ' whether a 

shares of a fixed total sum is f ^^ baSiS r as 
whether adjustment for ce n sus ~ 3 r determini *9 factor 
large change in the amount S * 5 " wil1 cause 
Obviously, errors xn cT era r f .T? " a " recei 
distribution of funds under n directly to mal- 

Per cait programs that oerate 



er n 
Per capita basis m f programs that operate on 

under ^ogri^ti-trtation of 



ner ogrta-rtion of fu 
generally occur only if th J ^ elL^f 3 fiX6d tOtal wil1 
significantly different rate! i? ! areas ex Perienced 
ex 0fnetunder 



ur ony if th el w 

significantly different rate! i? ! areas ex Perienced 
exception is . progra^with a s h f a net K under ^er ag e. The 
also includes an eliSbuS v %S T^ Sed f rmula that 
coverage errors wUltoi^y^;^" 1 ln this case ' 
dictions that are eligible to a H feCt . the number of juris- 
cation and hence will aff*U J f J? ln the fund allo ~ 
fixed total amount A t preLn^ 6 ^ tribution of the 
Programs that use formula, o/^ M grant 
of a fixed total, efth^becfusf l^^ ^ Sha " S 
explicitly share-based rtr K formulas are 

total amounts tta^^p^TSr f CeUin9S n the 
during a fiscal year. Onl v ,t , Pr 9rams Can disburse 
-elude eligibiil ty f 



. a ^ ula is 

are additional f a ^ Pn OUnts or whether 
factors dominate the formal 8 " T ^ 6Xtent that oth er 
have less effect ot f ^ ^i"f ' in C Verage ** 
Siegel (1975) analyzed S 1 ^ B Under the P r ^am. 
the census counts in each ^^f 1Catlons <* adjusting 
assumed to allocate $1 billon , a prograin that was 

state's share of total ^ T * aCC rding to each 

depending on which III P r pulat i n - He found that, 
(see desc^iptfon of h's sc P PUlati0n estimat ^ was used 
-apportionment^ ^^ fl s h fc % discussi - of 
1 percent or greater shifffn ? ates ex P e rienced a 

though 50 states (includiL !K n" f " nd allotra ent even 
estimated net undJr^a^f ^""^ f C lumbia ^ had 
each scenario. The scenario l^l ^ r greater under 
modified the national unl f 3<f the ^eatest effect 

account of median family in co m "?' by race tO take 
states experienced a shfft,'* ** this scena "^ 6 
4 states a shift Irom 1 to 2% ^^ ! ' tO 1 ' 9 percent ' 
of Columbia a shift i 



Most research in this area has focused on the general 
evenue sharing program, first authorized in 1972. The 
>rogram distributed over $5 billion to 39,000 govern- 
lental units including states and localities in fiscal 
.981 and over $4.5 billion to localities in fiscal 1984 
inder formulas based on population, per capita income, 
md tax effort factors. (States no longer receive 
evenue sharing funds, but the program still determines 
:irst the amount to be allocated in total to the local- 
.ties in each state and then applies a separate formula 
:o determine the share of each state's total for specific 
.ocalities see Table 2.1.) 

Siegel (1975), in an extension of the analysis just 
(escribed, simulated the distribution of revenue sharing 
:unds among the states. He compared the distributions 
ising unadjusted 1970 census population and income data 
rith distributions using: (1) adjusted population data 
mt assuming that per capita income remained as before 
[that is, assuming that uncounted persons had the same 
.ncome as counted persons); (2) unadjusted population 
lata but per capita income data adjusted to Bureau of 
iconomic Analysis control totals; and (3) adjusted 
>opulation and per capita income data. 

The results showed that simply adjusting population 
lever made any large numbers of changes in funds appor- 
:ioned to the states under general revenue sharing. 
Is ing the basic synthetic adjustment of population by 
,ge, race, and sex, the distribution of funds shifted by 
lore than 1 percent for only 5 states and by more than 2 
icrcent for only the District of Columbia. Using a 
lodified population adjustment based on median family 
ncome, 8 states experienced a shift of 1 percent or more 
md 5 states a shift of 2 percent or more. Adjusting per 
apita income alone resulted in more significant changes 
;5 states experienced a shift of 1 percent or more and 14 
tates a 2 percent or greater shift, with 4 of those 
itates experiencing a shift of 6 percent or more in their 
hare of funds. Adjusting population and per capita 
.ncome together also resulted in a larger number of 
hanges, especially using the modified population 
idjustment based on median family income together with 
he income adjustment under this scenario fully 32 
itates experienced a shift of 1 percent or more in their 
und allocation, 17 states a shift of 2 percent or more, 
md 5 states a shift of 6 percent or more. 

Several studies have examined the effect of census 
irrors on distribution of revenue sharing funds to 



localities. Siegel (1975;22) noted that "the role of the 
income component is even more dominant when the General 
Revenue Sharing formula is applied to counties and 
cities," because the income component in the tax effort 
factor as well as the per capita income factor requires 
adjustment. In addition, the formula for allocation to 
localities includes constraints so that no local area may 
receive less than 20 percent or greater than 145 percent 
of the state's average per capita payment or more than 50 
percent of the sum of its taxes and intergovernmental 
transfers. Siegel (1975s23) concluded that prior studies 
(Hill and Steffes, 1973? Savage and Windham, 1973 ; Strauss 
and Harkins, 1974; Grindley et al. , 1974) "all fail to 
make adequate allowance in the application of the formula 
for the understatement of the income component or to take 
account of the apportionment features of the Act." 

Robinson and Siegel (1979) carried out an illustrative 
study of the effects of 1970 census coverage and income 
reporting errors on distribution of general revenue shar- 
ing funds to localities within the states of Maryland and 
New Jersey. The results were similar to the findings in 
the earlier Siegel study for states in that adjustment of 
income had a greater effect than adjustment of popula- 
tion on shifts in the distribution of funds; however, the 
effects of adjustment were greater for local areas than 
for the states. States and local areas experienced 
similar average percentage shifts in fund distribution 
with just the population factor adjusted 0.6 percent for 
the 50 states and D.C., 1.0 percent for the 155 local 
jurisdictions in Maryland, and 0.7 percent for the 567 
local areas in New Jersey. With income alone adjusted, 
the average percentage shift in funds was 1.8 percent for 
the states, 4.1 percent for the Maryland local areas, and 
4.4 percent for the New Jersey local areas? while, with 
both income and population adjusted, these figures 
became, respectively, 1.9 percent, 8.5 percent, and 9.1 
percent. The local areas most affected by adjustment 
were those not constrained by the minimum and maximum 
allotments specified in the formula. 

Another way of looking at the impact of census errors 
is on a per capita basis, that is, how much lost revenue 
from various fund allocation programs each additional 
uncounted person represents for a state or local area. 
Maurice and Nathan (1982) undertook to answer this 
question for three different programs: (1) general 
revenue sharing, (2) the Community Development Block 
Grant program, and (3) mass transit subsidies provided 



69 

er section 5 of the Urban Mass Transportation Act. 
y investigated the simultaneous impact of a synthetic 
ulation adjustment using 1970 census national net 
lercount rates by race for 573 areas (central cities of 
ndard metropolitan statistical areas or cities with 
e than 50,000 residents). Over half the cities had 
imated net undercount rates of greater than 2 percent 

I almost one- fifth had estimated net undercount rates 
eeding 3 percent with the application of the synthetic 
ustment. 

Maurice and Nathan (1982:253) note that assertions are 
en made that each uncounted person represents a sig- 
; icant sum of money lost to a jurisdiction, e.g., the 

York City planning department estimated that the city 
Id lose $200 per year in federal aid for each resident 
sed in the census. In contrast, they find (1982:266): 
>r the majority of cities, the total change in allo- 
ion [for the three programs] resulting from an under- 
nt adjustment of population is in the range of plus or 
us $5 per uncounted person." For 18 large cities, the 
al change ranged from a loss of $11.80 (for Minneapo- 
) to a gain of $15.40 (for Philadelphia) . They 
lain this result as a consequence of three phenomena: 

the synthetic method of population adjustment produces 

II changes in cities' shares of the national popula- 
n, (2) population is not the only factor in most fund 
ocation formulas, and (3) one of the most important 
grams community development block grants includes a 
ulation growth lag variable in one formula used by 

er distressed cities that gives larger allotments to 

ies with larger net undercounts. 

Maurice and Nathan found greater effects of adjustment 

coverage errors in the public service employment 
tion of the Comprehensive Employment and Training Act 
TA) program. For selected large cities, the change in 
d allotment under this program ranged from a loss of 
per uncounted person for Los Angeles to a gain of $35 

New Orleans. However, they note that this finding 
Id be sensitive to the assumptions used regarding the 
Dr force status of uncounted persons. They also note 
t this component of CETA was not included in its 
sessor program, the Job Training Partnership Act. 
Finally, most of the studies of the effects of census 
Drs on fund allocation have found that, typically, 
e jurisdictions "lose" by an adjustment than "gain" 
pared with the distribution of funds using unadjusted 
sus data. For example, Robinson and Siegel found that 






^swHr^saSrs? 

of un it 7^ at the studies reported ' sftou f d caution, 



', the r 

f 



undercount rates b" 6 that raodifie d S 

effectsT^" famUy inco -^ generally producL 3 ^!' f 



T o FERBORSlNpoSTCENsAiEsiiMaTEs 

^s,r P = - ~ - -:sr ln 

StLffl: " on b -- L- SS. .rs.S.12- 1 



. 

toy P re Paring population 



O 

E< 
X 

1 

tl 



Ce 
E>r 

CO 



til 



U3LE 2.2 Percentage of Error in County Population Estimates for 1980 by 
etropolitan-Nonmetropolitan Residence 



rcentage of Error 


Metropolitan 
Counties (%) 


Nonmetropolitan 
Counties (%) 


ss than 1.0 percent 


19.8 


15.5 


) to 2.9 percent 


33.8 


29.2 


) to 4.9 percent 


20.5 


23.2 


) to 9.9 percent 


20.2 


24.5 


.0 percent or more 


5.6 


7.5 


erage absolute 






percentage of error 


3.7 


4.3 



)TE: Several different population estimation methods are used by the Census Bureau. These 
[a and the data in Tables 2.3, 2.4, and 2.5 are for the method with the smallest absolute errors in 
30. 

>URCE: Unpublished Bureau of the Census tabulations. 



itimates for 1980 in the same ways they were made during 
le postcensal years of the 1970-1979 decade and compar- 
g the estimates with the 1980 census counts (see 
:arsinic, 1983, for a description of the method and a 
port comparing 1980 estimates with the census counts 
>r states) . The comparisons show that the size of the 
rors in the postcensal estimates for areas below the 
:ate level dwarf those in the census. This is not a 
flection on the Census Bureau. A considerable amount 
: research has been conducted on the methodology for 
pulation estimation and the estimates have been improv- 
tg over the years. However, there are at present 
herent limitations in the data bases used to prepare 
ie estimates and statistical manipulation can only 
rtially correct for them. 

Tables 2.2 through 2.5 (extracted from forthcoming 
nsus Bureau publications) illustrate the nature of the 
oblem. The 1980 estimates for 7.1 percent of the 3,142 
unties in the United States had errors of 10 percent or 
re. The errors tended to be concentrated in the smaller 
unties: 18.8 percent of counties with population under 
000 and 9.4 percent of those between 5,000 and 10,000 
d errors of 10 percent or greater. However, errors of 
is size were not solely a small county phenomenon, of 
e 412 counties with 100,000 or more persons, 2.2 percent 
re off by 10 percent or more. Of course, the evaluation 
vered 1980 and the errors accumulate over time, so that 
ese results probably reflect the situation only in the 



72 



TABLE 2.3 Selected Measures of Accuracy of County Population Estir 
for 1980, by Size of County 







Percentage of Co 




Average Absolute 


With Errors of 


Population of County 


Percentage of Error 


10.0 Percent or J 


Less than 5,000 


6.1 


18.8 


5,000 to 9,999 


4.8 


9.4 


10,000 to 24,999 


4.1 


7.1 


25,000 to 49,999 


4.0 


5.5 


50,000 to 99, 999 


3.8 . 


3.2 


100,000 or more 


3.0 


2.2 


Total 


4.2 


7.1 



SOURCE: Unpublished Bureau of the Census tabulations. 

last few years of the decade. Even so, the potential! 
large impact on uses of the data is disturbing* 

The situation is even more serious at the subcounty 
level. The average absolute percentage error among th 
35,644 subcounty areas analyzed was 15.2 percent. As 
the case of counties, the smaller areas were subject t 
greater errors, with the average percentage error rang 
from 35 percent for areas with less than 100 persons t 
percent for those with over 100 , 000 population. Forty 
eight percent of all areas had errors of 10 percent or 
greater. Of the 160 areas with 100,000 or more person 
4.4 percent had errors between 10 and 19 percent. Bot 
positive and negative errors existed. For example, of 
the 6,012 places with errors of 25 percent or more, th 
errors were in the negative direction for 2,320 places 
and in the positive direction for 3,692. The differen 
in population estimates between some pairs of places 
could thus be off by more than 50 percent of their 
populations. 

These errors contrast with those in the census, whe 
even the black-white .differentials in coverage are not 
large enough to make it likely for places to be under- 
counted by more than a few percentage points. A compa 
son of the two sets of errors suggests that the emphas 
on census errors in the past few years has been somewh 
misplaced, and that users of the data would have been 
better served if some of the funds used to reduce unde 
coverage in the 1980 census could have been used to 
improve postcensal data. 

A detailed analysis of the postcensal estimates is 
within the scope of the charge to the panel. However, 



73 

TABLE 2.4 Percentage of Error in Subcounty Population Estimates for 1980 
Percentage of Error Percentage of Places 



-25.0 or more 


6.5 


-24.9 to -15.0 


8.6 


-14.9 to -10.0 


8.2 


- 9.9 to - 5.0 


12.7 


- 4.9 to - 0.1 


15.4 


0.0 to 4.9 


14.1 


5.0 to 9.9 


9.7 


10.0 to 14.9 


6.7 


15.0 to 24.9 


7.6 


25.0 to 49.9 


7.0 


50.0 or more 


3.4 



NOTE: There were 35,644 places for which estimates were made and evaluated. 
SOURCE: Unpublished Bureau of the Census tabulations. 



strongly urge the Census Bureau to examine the cost- 
effectiveness of a mid-decade census compared with the 
cost-effectiveness of the extra effort required to achieve 
the last one-half to one percent coverage improvement in 
accuracy of the decennial census. If , as we suspect, a 
mid-decade census would significantly improve the useful- 
ness of the data for key purposes, such as allocation of 
federal and state funds, compared with marginal coverage 
improvement efforts in the census, this fact should be 
transmitted to the administration with a strong recommen- 
dation that funds be budgeted for a mid-decade program 
for 1995. We realize that diverting some coverage im- 
provement funds from the decennial census to the mid- ^ 
decade census will only partly support the latter program. 
The additional support needed would be more than jus- 
tified, in bur view, if further study demonstrates the 'j 
value of a mid-decade census for importantly improving ;-.;/ 
overall data quality. A mid-decade census program may / 
also afford operational advantages for census- taking, l f 
such as facilitating retention of experienced staff, that \\ 
would further improve data quality and/or reduce the per | 
person costs. : 
We recognize that the temper of the times is not con- ;[ 
ducive to the initiation of new programs, but we believe ^ 
that statisticians have the responsibility to describe || 
the facts and recommend the actions they believe are j4 
sensible. We think it highly likely that reallocation of 'A 
funds from marginal efforts to achieve small reductions f'f 
in the decennial census undercount to a mid-decade pro- ! *f 



74 



TABLE 2.5 Selected Measures of Accuracy of Subcounty Population 
mates for 1980, by Size of Area 







Average 










Number 


Absolute 


Percentage 


of Areas With Errors of: 




Population 


of 


Percentage 








of Area 


Areas 


of Error 


Less Than 


10% 10% to 19.9% 


20% or 


Total 


35,644 


15.2 


51.9 


24.5 


23.6 


Under 100 


2,425 


35.1 


21.4 


20.0 


58.6 


100-499 


11,085 


19.8 


37.5 


26.9 


35.6 


500-999 


6,613 


13.2 


52.2 


27.7 


20.2 


1,000-2,499 


7,141 


11.6 


58.6 


26.3 


15.1 


2,500-4,999 


3,348 


9.6 


66.7 


22.6 


10.6 


5,000-9,999 


2,212 


8.3 


72.3 


20.6 


7.1 


10,000-24,999 


1,740 


6.5 


80.6 


14.4 


4.9 


25,000-49,999 


636 


5.5 


84.9 


11.9 


3.1 


50,000-99,999 


284 


4.5 


93.3 


6.0 


0.7 


100,000 or more 


160 


3.9 


95.6 


4.4 


-0- 



SOURCE: Unpublished Bureau of the Census tabulations. 

gram would improve overall data accuracy and thus con- 
tribute to equitable political representation, fund 
allocation, and public administration. The panel urg 
that these issues be thoroughly explored before the 19! 
census plans are finalized* 

Recommendation 2.1. We recommend that the Census 
Bureau assess the need for a mid-decade census, 
particularly by studying the effect of errors in 
postcensal population estimates compared with error i 
in the decennial census on major data uses, unless 
these studies do not support the value of a mid-dec< 
census, the Census Bureau should proceed with prepa: 
tions and make every effort to secure funding to 
conduct a census in 1995. 



APPENDIX 2.1 
STATE AND LOCAL GOVERNMENT USES OF CENSUS DATA 

Government agencies at all levels federal , state , and 
local are heavy users of census data. This appendix 
reviews typical applications of census data made by state 
and local agencies* 1 At these levels of government , 
the decennial census is an invaluable and unmatched 
resource in providing comparable small-area and subgroup 
data. 

STATE USES OF CENSUS DATA 

State governments use census tabulations in ways that are 
similar to federal and local uses and in ways that are 
unique to the states' role in the federal system. Based 
on a review of uses specified by a reasonably representa- 
tive group of states (Alaska, Connecticut, Florida, 
Georgia, Illinois, Indiana, Iowa, Missouri, Montana, New 
Jersey, New York, Oregon, Tennessee, Virginia, Wisconsin), 
the kinds of applications described below are typical for 
this level of government. 

Use for Redistricting 

The states determine the boundaries of congressional 
election districts, as well as districts for state 
legislative offices. Under the "one man, one vote" 
requirements imposed by the courts for equal population 
size and compactness of districts, small-area census data 
are essential for the task of redistricting. The chapter 
text indicated the data requirements for this use of the 
decennial census figures and reviewed potential 



1 Much of the material in this section comes from a 
survey of federal, state, and local government agencies 
initiated by the Census Bureau in fall 1982 requesting 
information on specific needs for subject matter and 
geographic detail from the census for uses mandated in 
legislation. The responses are summarized in Her riot 
(1984) . Many agencies indicated other kinds of uses in 
addition to mandated ones. 

75 



76 

problems posed by differential undercoverage and by 
discovery of other kinds of errors , such as processing 
mistakes, subsequent to release of the redistricting 
tabulations one year after Census Day. 



Use to Classify Local Governments 

All the states denominate various categories of local 
governments, such as municipalities or townships , by 
population size and accord varying rights and responsi 
bilities to each size class. For example, compensatio 
of county clerks in Missouri is established as a funct 
of population size and assessed valuation* This appli 
tion uses census figures as thresholds, and hence cove 
age errors can be important if a locality is put in th 
wrong size class. However, many state statutes includ 
language that permits localities to submit alternative 
population counts, for example, from special censuses. 



Use to Allocate State Funds 

Many states have programs to allocate state monies to 
localities on the basis of formulas similar to federal 
programs like general revenue sharing. For example, t 
State of Alaska has a state revenue sharing program th 
distributes money to municipalities and unincorporated 
places. The State of Florida allocates its 2 cents pe 
gallon gasoline tax to counties via a formula that 
includes three terms for each county: 

One-fourth the ratio of the county land area to 
the state, plus one- fourth the ratio of the county 
population to the state, plus one-half the ratio 
of the county gasoline tax dollars to the state* 

Most states with a motor vehicle fuel sales tax distri 
bute the receipts using a formula including local poP u 
lation counts. Many states likewise distribute the 
proceeds of consumption or nuisance taxes, such as par 
mutuel, cigarette, and alcoholic beverage taxes, on th 
basis of population (Bryce, 1980:112-113). The State 
New York allocates funds for building code enforcement 
counties and cities using a formula that includes each 
area's share of the total noninstitutionalized populat 
and of total real property valuation, while Iowa alloc 



77 

day care center funds on the basis of numbers of children 
under 7 and low- income families. The equity of the dis- 
tribution of monies under these various state programs is 
presumably affected by differential under cover age* The 
chapter text discusses what is known about the effects of 
errors in the census count on fund allocation formulas 
for various federal grant programs. 



Use for Equal Employment Opportunity Purposes 

Every state in the nation has requirements, in legis- 
lation or executive order g for state agencies to imple- 
ment one or more kinds of equal employment opportunity 
(EEO) or affirmative action programs with regard to 
hiring and personnel practices* State agencies make use 
of census data to establish affirmative action goals and 
to monitor how well equal employment opportunity programs 
are meeting their goals. The most common data require- 
ments are for occupation by race and sex for counties and 
cities. Many states also need data on occupation and 
industry by age, veteran status, disability, and language 
spoken. After 1980, the Census Bureau provided a special 
EEO file that contained detailed occupation cross- 
tabulated by sex, race, and Hispanic origin, plus years 
of school completed tabulated by age, sex, race, and 
Hispanic origin for counties, cities of 50,000 or more 
population, and metropolitan areas. The Census Bureau's 
Data User Services Division sold over 330 copies of this 
file directly to users, in addition to providing copies 
to all State Data Centers (from information furnished by 
Michael Garland, Chief, Data User Services Division) . 

There are many related applications of census data by 
the states in the area of antidiscrimination efforts. 
The State of Missouri anti-redlining statute requires the 
Department of Commerce to monitor bank compliance using 
data on the characteristics of the housing stock (number 
of units, tenure, etc.) and of the population (race and 
income) by census tract in several cities. 

EEO applications of census data require tabulations of 
groups such as blacks and Hispanics that are known to be 
covered less well than other groups. Moreover, these 
applications require additional data such as occupation 
and income, and it may well be that errors or problems 
with these items have greater impact on the validity of 
conclusions drawn or actions taken on the basis of the 
cross- tabulations than simply differential under cover age 
by race. 






78 
Use for Implementation of Federal Programs 

Many federal programs that distribute funds to states 
localities require applications for specific programs 
projects rather than simply allocating dollars accorc 
to formula. States use census data to support grant 
applications of all kinds. For example, the State oi 
Florida Department of Health and Rehabilitative Servi 
needs data on the elderly population (persons 60 and 
over) by race in each county to justify funds for soc 
and nutrition services programs under the Older Amerj 
Act; the Florida Department of State , Division of Lifc 
Services, needs census data on income by age, race, c 
Hispanic origin for counties and cities for funding i 
the Library Services and Construction Act. 

Use for Statewide Planning 

The states use census data for many kinds of planning 
purposes. Just to name a few examples, the Alaska 
Department of Natural Resources requires small -area < 
on population, income, employment by industry, houset 
size, and permanent versus seasonal residence for pl< 
various park and recreation programs. The Florida 
Department of Transportation has statute-based requi] 
ments for census data on population, density, income i 
auto ownership, and employment by occupation and indi 
for small areas for statewide transportation planninc 
The Florida Department of Education needs census dat< 
age, sex, education, income and poverty by county, ar 
current county population estimates by single years < 
age for community college, state university, and adu] 
education program planning. The Missouri Department 
Agriculture uses county population categorized by age 
plan publicity for the Missouri State Fair, and the 
Department of Mental . Health develops measures of 
prevalence of mental disorders, alcoholism, and drug 
abuse, and plans service programs using census tract 
data. The State of Indiana uses county and census t] 
population and counts of housing units with basements 
planning nuclear civil protection. 

A related use is to determine workload needs for 
various state services. For example, under its OmniJ 
Crime Control and Safe Streets Act, the State of Mont 
uses census data for counties and cities on sex, rac< 
age, and income to estimate personnel needs and work] 
for public safety programs. 



79 
LOCAL AGENCY USES OF CENSUS DATA 

Local governments exhibit many of the same kinds of uses 
of census data as do the states, including use of the 
data for redistricting. 2 If anything, localities have 
a greater need for census data for very small areas , such 
as blocks and tracts. 

Typical census data uses cited by specific local 
governments include the following s 

Use for transportation planning , including 
planning of highways and other commuter transportation 
modes and forecasting airport demand (Orange County f 
Fla.y Pueblo Regional Planning Commission, Colo.f Corpus 
Christi, Tex.? Tri-County Regional Planning Commission, 
Harrisburg, Pa.j Lincoln City-Lancaster County Planning 
Department, Neb.? City of Detroit, Mich.). 

Use for planning local building and development 
projects (Houston, Tex.; Tri-County Regional Planning 
Commission) and for obtaining mortgage revenue bonds 

(Amarillo, Tex.) . 

Use for services assessment and planning, such as 
needs assessments for human resources services in local 
community target areas using data on the elderly living 
alone, female-headed families, and children by census 
tract (Houston) ? development of state-mandated community 
services area plan (Orange County) . 

Use to support grant applications for state and 
federal funds, for example, determination of transit 
subsidies from the regional transit authority using 
population and automobile availability by small area 
(Detroit) ? applications to the state small communities 
program using data by block and tract on population, 
housing, employment, income and poverty (Tri-County 
Regional Planning Commission) ; support of applications 
for family planning services project grants using data on 



2 Much of the material in this section comes from the 

Census Bureau survey previously cited (see Herriot, 'j\ 

1984) . This survey obtained responses from a small .^ 

number of cities less than 20 and most of these cities f 

noted that they had relatively few mandated uses of !) 

census data. However, the examples of census data use M 

discussed in this section appear to represent typical i| 

local applications. | 

', i 



80 

ethnicity 8 age, income and poverty for women ages 15-4 
(County of San Diego Department of planning and Land t 
Calif.)- 

Some formula grant programs, in addition to the cat 
gorical programs f place data requirements on localitie 
above and beyond the need for the items that go into t 
formula. For example, the HUD Community Development 
Block Grant (CDBG) program has one set of data needs t 
determine fund allocation, another set to use in a 
Housing Assistance Plan (HAP) that each locality must 
develop before the CDBG funds to which the locality is 
entitled can be released, and yet a third set to monit 
the impact of the program on housing for low- and 
moderate- income, minority, and female-headed household 



APPENDIX 2.2 
CENSUS DATA USE IN NEW JERSEYA CASE STUDY 

This appendix endeavors to sketch a picture of census 
data uses and users in one geographic location. New 
Jersey was chosen because of ready availability to the 
panel of relevant information. Examples of uses from all 
sectors public, private, academic are included. 

DISTRIBUTION CHANNELS 

Before describing users and uses, it will help orient the 
exposition to identify the various channels for dis- 
tributing census data within New Jersey. The federal 
government offers documents, including census publica- 
tions, for sale through the U.S. Government Printing 
Office and, by law, makes free reference copies available 
to the nation's 1,350 depository libraries. Rowe has 
estimated (U.S. House of Representatives, 1982s424) that 
perhaps as much as 50 percent of census data use is by 
the millions of people who visit libraries every day to 
obtain needed information on a variety of subjects. 

Census data in nonprinted form, including tabulations 
on computer tape (summary tape files) , tabulations on 
microfilm and microfiche, and samples of individual 
microdata records (public use microdata sample files) , 
are sold directly by the Census Bureau's Data User 
Services Division (DUSD) . Census tape files serve a 
growing need for more elaborate and extensive analysis 
than printed reports can readily serve. The tapes 
contain many more data items than can be printed in a 
manageable set of volumes and offer the advantage that 
the user can readily reprocess the data using computers. 
The availability of samples of microdata records (with 
identifying information removed) has greatly expanded the 
capabilities for original analysis and retabulation of 
the census responses to suit the user's needs. 

The Census Bureau has also set up a network of state 
data centers that receive publications and computer tapes 
containing the census tabulations for their state for 
redistribution to users. The typical structure includes 
a lead agency in the state government that works with the 
state library and one or more universities to provide a 
full range of user services, plus a number of affiliates 
that provide basic reference services throughout the 

81 






I 



82 

state. Currently, there are data centers in 49 states , 
Puerto Rico, and the Virgin Islands (Riches 1984b) . 

New Jersey is one of the states with an active state 
data center. The New Jersey State Data Center is house 
in the Department of Labor and Industry and works with 
the New Jersey State Library and with Princeton Univers 
and Rutgers University to provide a full range of proce 
sing and reference services to users. The center has a 
local affiliates the planning boards for each of the 
state's 21 counties plus the Delaware Valley Regional 
Planning Commission. In addition , all county libraries 
receive State Data Center materials. 

Finally, a growing number of private firms are in th 
business of supplying users with census (and other publ 
data. A recent survey by American Demographics (Riche , 
1984a) identified 68 firms that repackage and resell 
government statistical data, one of which is located in 
New Jersey. While these firms handle some general 
information requests, most of their work is for clients 
who need specific tabulations or analyses that often 
require extensive processing of computerized census 
data. Many of these firms provide a range of services 
based on census data, such as profiles and projections 
local area characteristics for site selection and marke 
analysis; relating client information such as number of 
accounts to census characteristics for ZIP codes or oth 
areas; and development of sampling frames and designs f 
local surveys. Other firms specialize in such services 
as using census data for election campaigning and voter 
registration drives, affirmative action planning and 
legal actions, and fulfillment of regulatory requiremen 
In fact, it is probably the case that these firms serve 
more users of census tapes than does the Data User Ser- 
vices Division. The DUSD supplies tape copies to users 
and will prepare special summaries of the confidential 
microdata tapes, but does not make extracts or special 
tabulations of publicly available data tapes. The DUSE 
filled over 5,100 orders for 1980 census computer tapes 
from 1981 to 1983 (from information furnished by Michae 
Garland), representing a small fraction of total user 
orders for tapes and analyses and tabulations produced 
from the data tapes. 



83 

TABLE 2.6 Data Requests Received by the New Jersey State Data Center's 
Lead Agency by Type of User and Data Source, 1983 





Data Source 












Type 


Census of 






Other 


Lead 




of 


Population 


Economic 


Other 


Federal 


Agency 




User 


and Housing 


Censuses 


Censuses 


Data 


Data 


Total 


Academic 


7.0% 


3.7% 


4.2% 


9.5% 


6.6% 


6.8% 


Business 


22.9 


54.3 


43.5 


22.9 


28.3 


27.7 


Government 


58.6 


33.3 


19.3 


22.7 


19.5 


39.9 


Private 














individuals 


11.5 


8.6 


32.9 


44.9 


45.6 


25.6 


Total number 


2,024 


243 


331 


568 


965 


4,131 


Percentage of 














overall total 


49.0% 


5.9% 


8.0% 


13.7% 


23.4% 


100.0% 



SOURCE: Connie 0. Hughes, director, New Jersey State Data Center, personal communication to 
Constance F. Citro, March 1984. 



PROFILE OF STATE DATA CENTER USERS AND USES 

The New Jersey State Data Center lead agency-the Office 
of Planning and Research (OPR) within the State Depart- 
ment of Labor and Industry serves a large number of 
census data users each year. The agency has tracked data 
requests received by phone and reported that in 1982 
phone requests totalled 3,600 9 rising to over 4,100 in 
1983 (from information provided by Connie O. Hughes, 
director of the State Data Center) . The increase 
resulted despite the policy effective July 1, 1982, of 
reduced direct service to the general public due to 
budget cuts incurred by OPR. Fully half the requests in 
1983 over 2,000 were for data from the 1980 decennial 
census. Almost three-fifths of the census requests were 
received from other government agencies, about one- 
quarter from businesses, 10 percent from private 
individuals, and 7 percent from academia. Table 2.6 
shows the distribution of requests by type of data (1980 
decennial census, economic censuses, other censuses, 
other federal data, OPR data) and type of user. 

The Princeton University Computer Center, which works 
closely with the lead agency, reported on a week's sample 
of census use in spring 1984 (from information provided 
by Judith S. Rowe, associate director) . Projects that 
took some amount of staff time included: 



84 

A study of migration patterns and the demogr< 
characteristics of the 1975 residents of the servi< 
of a utility company in Texas, using the public usi 
microdata sample files (PUMS) ? 

A study of migration from Long Island compare 
with migration from similar metropolitan areas , in 
age, occupation, income, and other characteristics 
out-migrants, for a Long Island newspaper, using t 

Development of profiles from the PUMS of rec: 
ment pools (age, income, race) by district for the 
military services? 

Construction of a data set merging selected 
summary tape file 1 and summary tape file 3 data f 
minor civil divisions and unincorporated places on 
housing and homeowners for a private company that 
supplying data to realtors? 

An analysis of voting behavior in Chicago us 
summary tape file 1 to define neighborhoods along 
and ethnic lines for an undergraduate student in p 
science; 

An analysis of need and ability to pay for h 
health care using summary tape file 4 tables on ag 
income for a graduate student at Wharton, employed 
New Jersey hospital? and 

An analysis as part of a continuing study of 
commuting patterns in New Jersey using the Urban T 
portation Package of special tabulations of place < 
and journey to work data for a professor in the tr 
portation program. 

The Rutgers Center for Computer and Information 
Services, the other main component of the New Jers< 
State Data Center, described its activities for 19 
(from information furnished by Gertrude J. Lewis, ] 
leader) . The center keeps current copies of Rutge 
University Guide to Machine Readable Data Files in 
the university's libraries, and sophisticated users 
access the available files, which include census a: 
other data sets, without the center's active help. 
1983, almost 30 different departments specifically 
requested the center's machine-readable data files 
the decennial census files, the center handles pho: 
calls from many business firms inside and outside ^ 
state. Users are encouraged to do their own compu 
with census data. Three times a year, the center < 
seminars on using census files and also offers spe 
seminars on request. 



85 
Examples of census data use at Rutgers include: 

Two faculty members in the Department of Sociology 
compared health needs of the poor with their service 
utilization using PUMS files and data from the National 
Center for Health Statistics ; 

A faculty member in sociology and urban studies 
analyzed change in housing prices and characteristics 
between 1970 and 1980 at the minor civil division level 
both for research and instruction to undergraduates and 
graduates? 

A staff member of the affirmative action depart- 
ment used data from the special EEO file to construct 
figures on availability of minorities and women to 
determine utilization in the university's work force; 

A faculty member in the Department of Agriculture/ 
Economics compared state and county population figures 
among cities and places in the United States for 1970 and 
1980; 

A professor in the Graduate School of Management 
carried out research on travel behavior between 1970 and 
1980 with emphasis on transportation and the energy 
crisis; 

A graduate student in geography for his doctoral 
thesis used census population and housing characteristics 
to correlate the rate of subsidies at the census tract 
level in Manhattan; 

Undergraduate students in geography extracted 
census data and mapped the data using SAS/GRAPH; 

A graduate student in the School of Criminal 
Justice correlated census demographic data at the block 
group level with criminal data for his dissertation; 

A researcher in the Center for Urban Policy 
Research assists research personnel throughout the year 
in accessing American Housing Survey and decennial census 
data. These projects cover a variety of topics , such as 
assessing population change for planning boards and 
studying segregation and integration within the state; 
and 

A research student in the Department of 
Agriculture/Economics accessed census data to analyze 
factors affecting employment change between 1970 and 1980 
in rural communities in the United States. 



86 
GOVERNMENT USES OF CENSUS DATA IN NEW JERSEY 

The State of New Jersey regularly uses census data for 
many purposes, typical of state governments across the 
country. These uses includes 

Redrawing congressional and state legislative 
districts. 

Classifying local governments. A review of the 
state code in 1973 identified over 800 statutes that 
referenced population data? most of these references 
classified local governments by size and stipulated the 
rights and responsibilities of each class. For example 
the term of office of street commissioner is three yean 
in cities of the second class, with population of 100 f OC 
to 250,000. 

Apportioning state funds to localities. New 
Jersey apportions motor vehicle fuel and general sales 
tax dollars to local jurisdictions based on population. 

Apportioning other kinds of services. New Jersey 
law states that localities may not grant retail liquor 
licenses in excess of 1 for every 3,000 population nor 
wholesale liquor licenses in excess of 1 for every 7,50( 
population, "as shown by the last then preceding Federal 
census" (although a municipality with fewer than 1,000 
population can have one wholesale and one retail license 
in any case) ; members of the board of trustees for a 
community college that serves more than one county are 
allotted to each county based on population. 

Meeting equal opportunity requirements. New 
Jersey requires all agencies to develop equal employment 
opportunity plans and to monitor their progress in 
meeting EEO goals using data on the civilian labor force 
by race and sex for the state, counties, and cities; the 
Department of Banking uses data on housing stock 
characteristics such as number of units and tenure and 
on population by race and income for all incorporated 
places to enforce the state's anti-redlining statute. 

Approving applications. The Department of 
Banking approves applications for bank charters and bank 
branches based on economic feasibility determined from 
analysis of population, number and size of households 
and income by census tract for the area to be served? th 
Division of Mental Health and Hospitals allocates funds 
to community agencies according to past performance and 
need-based plans submitted by each agency that analyze 
data on age, income, marital status, race, and other 
characteristics for the census tracts and places served. 



Census Methodology: 
Prior Practice and 
Current Test Plans 



The panel believes it is important to consider proposed 
changes in the methodology of the decennial census in the 
context of past experience* Changes that depart greatly 
from recent methodology need careful consideration of 
their costs and benefits. Review of practices in previous 
U.S. censuses and in the censuses of other Western 
nations can also suggest ideas that may be worth adopting 
for future censuses in the United States. The purpose of 
this chapter is to provide background for the discussion 
and recommendations in subsequent chapters on proposed 
changes in methodology. The chapter first describes the 
methodology followed in taking the 1980 decennial census. 
The description is not meant as a comprehensive account 
but as an overview to acquaint readers new to the 
decennial census process with the basic procedures and 
their chronology. Next, the discussion briefly refer- 
ences alternative procedures that were followed in 
previous U.S. censuses and related procedures used in 
other Western nations. 

The remainder of the chapter provides an overview of 
the Census Bureau's research and testing plans for the 
1990 census as currently formulated. The panel offers a 
general assessment of these plans and makes recommenda- 
tions directed to strategies for selecting priority 
projects. Subsequent chapters provide detailed recommen- 
dations on pretest and research plans in specific areas. 



87 



88 
1980 CENSUS METHODOLOGY 

It is convenient for descriptive purposes to divide the 
process for the 1980 census into eight components. Thes 
are (roughly in chronological order): 

(1) Development of a master address list of 
residential housing; 

(2) Development of lists of special places, e.g./ 
institutions; 

(3) Checking of address list prior to the census; 

(4) Enumeration; 

(5) Follow-up; 

(6) Coverage improvement; 

(7) Data processing; 

(8) Post-census evaluation. 

These eight headings give a quick overview of how the 
census was taken with the methodology of 1980: (1) a 
master address list of housing units was constructed fro 
a variety of sources; (2) other kinds of group housing 
were added in; (3) these addresses were checked for 
completeness and accuracy; (4) forms were then delivered 
and collected by mail and by enumerators; (5) complete 
responses were sought for incomplete questionnaires , 
including forms that were completely blank , and for 
questionnaires that were not returned; (6) alternative 
enumeration methods were used to obtain responses from 
hard-to-count elements of the population; (7) the 
questionnaire data were converted into computer -readable 
form, incomplete or inconsistent information was imputed 
and final census products (counts, cross-tabulations, an 
sample public use microdata files) were created; and as . 
last step, (8) the accuracy of the final set of records 
was evaluated to inform users of the quality of the data 
presented and to help design the next decennial census. 
The following sections more fully describe each of these 
eight components. The discussion draws heavily on Bureai 
of the Census (1982b, 1983b) , Bounpane (1983), and 
National Research Council (1978) . 



Development of Master Address List of Residential Housing 

Building on the experience of previous censuses, the 
Census Bureau made the fundamental decision for 1980 to 
enumerate the vast majority of the population about 95 



89 

percent of the total using mailout-mailback procedures. 
Use of the mails required construction of a comprehensive 
address list. For purposes of this step in 1980 1 the 
Census Bureau divided the United States into three types 
of areas: (1) mail areas for which the Census Bureau 
purchased commercial mailing lists , (2) mail areas for 
which Census Bureau staff developed the mailing list, and 
(3) conventional (non-mailout) areas. 



Areas for Which Commercial Mailing Lists Were Used 

For urban areas that met certain requirements? (1) the 
Census Bureau had a computerized geographic coding file 
for the area, (2) the area was located within the Postal 
Service city delivery boundaries, and (3) computerized 
commercial mailing lists were available for the area, the 
Census Bureau purchased several of the more complete and 
accurate commercial lists and used them to develop a 
master tape address register (TAR) . In New York City, 
Philadelphia, and Chicago, the Census Bureau merged the 
1970 census master address list with the commercial 
mailing lists obtained and in New York City also merged 
the 1978 census dress rehearsal list into the master TAR 
list. Elsewhere the 1970 lists were not used. This 
procedure represented an extension of the 1970 process 
wherein the Census Bureau purchased only one mailing 
list. The 1980 TAR areas accounted for over 50 percent 
of all residences. 



Mail Areas for Which Commercial Mailing Lists Were Not 
Used 

For the remaining mail areas, which accounted for over 
40 percent of all residences, Census Bureau personnel 
"prelisted" each area, that is, compiled a list of 
addresses in the field. In 1980, field staff were 
instructed to "knock on every door with no callbacks, 
that is, conduct a physical canvass of all potential 
residences, including an attempted contact, to help 
determine whether the address was indeed a residence and 
was occupied. Where the canvassers could not make 
personal contact with occupants during this stage, they 
obtained information on occupancy status from neighbors, 
landlords, etc. 



90 
Conventional (Non-Mai lout) Areas 

There were some areas of the United States for wl 
the Census Bureau felt it was more cost-effective b 
enumerate by conventional means, i.e., by sending 01 
enumerator to obtain a completed questionnaire inst< 
asking residents to mail back a form. The enumerate 
compiled the address list at the time of enumeratioi 
these areas/ which contained about 5 percent of the 
residences of the United States and were mostly thir 
populated. 

Development of Lists of Special Places 

For the 1980 census, the Census Bureau compiled from 
variety of sources lists of so-called special places 
which people live in nonresident ial settings, includ 
college dormitories, military bases, naval vessels, 
hotels, motels, and shelters, and institutions such < 
hospitals, nursing homes, and penitentiaries. The 
population residing in special places is not insig- 
nificant about 3 percent of the total in 1980 (Bure< 
the Census, 1982c:53) and such places can pose speci 
problems for obtaining a complete and accurate 
enumeration. 



Checking of the Master Address List Prior to the Cens 

After compilation of the master address list, the nex 
step was to implement several checks for accuracy and 
completeness. In the TAR areas (urban areas for whic 
the Census Bureau purchased computerized commercial 
mailing lists) , U.S. Postal Service staff conducted t 
checking operations and Census Bureau enumerators con 
ducted yet a fourth. 

The Postal Service carried out an advance post off 
check (APOC) in summer 1979 mail carriers checked ad 
cards for completion and accuracy while following the 
regular routes. Census enumerators made a second che 
of the master address list in early 1980, the precanv 
to verify that every address still existed and was 
assigned to the correct geographic area. The enumera 
also added missed or newly built residential units to 
list. The Postal Service carried out the third and f 
checks in the TAR areas just prior to Census Day, Apr 



91 

In the casing check, three weeks prior to enumeration, 
mail carriers received addressed census questionnaires 
with instructions to note any addresses to which they 
deliver mail for which they did not receive a question- 
naire. Finally, during the actual delivery of the census 
questionnaires, mail carriers again noted addresses for 
which they did not have questionnaires the time of 
delivery (TOD) check. In the prelist mail areas where 
census enumerators had developed the address list, only 
the casing and time of delivery checks were performed. 
The various address check programs represented an 
expansion of similar programs that were conducted in 1970. 



Enumeration 

Enumeration what is generally thought of when one hears 
the word census was the next step. In the mailout areas 
of the country, mail carriers delivered the census ques- 
tionnaires two or three days prior to Census Day, April 
1, 1980. Each questionnaire included instructions for 
the respondents to fill out and mail back the completed 
form to the local district office. For most questions, 
respondents were to blacken circles that could be read by 
the Census Bureau's computerized data input system 
(FOSDIC) , while other questions required handwritten 
entries. In the 1970 mail census areas, questionnaires 
tfent out in the mail to 60 percent of residential 
addresses and were received back in the mail from 86 
percent of the occupied residences. In 1980, 95 percent 
of addresses got questionnaires in the mail and 83 
percent of occupied households mailed them back. In most 
areas in 1980, five of every six households received 
short-form questionnaires containing a limited set of 
population and housing items; every sixth household 
received the long-form questionnaire containing the items 
asked of every household plus additional items asked just 
Df the one-sixth sample. In places of under 2,500 
population, one in two households received the long 
form. Overall, about 20 percent of households received 
the long form. 

As one of several experiments conducted as part of the 
L980 census in selected district offices, the Census 
Bureau tested the use of a somewhat different procedure 
for delivering the questionnaires, called update list/ 
leave. In this procedure, enumerators instead of mail 
carriers delivered the questionnaires, and at the same 



92 

time updated the address list (see Chapter 5 for some 
results of this experiment) . 

In the conventional areas, the Postal Service delivei i 
unaddressed short-form questionnaires to all households 
several days before April 1. Householders were instruct 
to fill out their form and wait for an enumerator. 
Beginning on Census Day, enumerators visited each house- 
hold and picked up a completed form or helped the 
residents complete the form, at the same time compiling 
list of addresses. At designated households, enumerator 
helped the residents complete the long-form questionnair 

In both mailout-mailback and conventional areas, 
specialized procedures were used to obtain questionnaire; 
(individual census reports) containing just the populatic 
items from persons living in various types of group quar- 
ters, such as military bases, naval vessels, college 
dormitories, prisons, and hospital chronic wards. At 
places offering transient residence, such as hotels, 
motels, and missions, Census Bureau staff enumerated 
travelers who had no one at their usual home to count 
them and other persons with no usual place of residence. 



Follow-Up 

In the 1980 census mail areas, the first stage of follow- 
up began two weeks after Census Day. This stage concen- 
trated on obtaining questionnaires that had not been sent 
back to one of the 375 district offices (unit non- 
response) . Enumerators were instructed to return, a 
total of four times if needed, to residences that did not 
mail back a questionnaire. At the end of this process, 
enumerators as a last resort asked neighbors and landlords 
for any information that they might have about the resi- 
dents and completed basic demographic and housing items 
on the questionnaire. Census office staff also followed- 
up over the telephone households that mailed back an 
incomplete questionnaire to obtain the missing informa- 
tion (item nonresponse) . In several district offices, on 
an experimental basis, Census Bureau office staff followed 
up nonresponding households over the telephone using 
directories ordered by address (see discussion of this 
experiment in Chapter 6). 

A second stage of personal follow-up in the mail 
census areas began two to three months after Census Day. 
In this stage, census enumerators implemented several 
specific coverage improvement procedures, described in 



93 

the next section, followed up the very small percentage 
of nonresponding households (estimated at about 2 
percent) for which not even "last resort" information was 
obtained in the first stage , and also followed up for 
missing items on otherwise complete questionnaires for 
which the earlier telephone follow-up was not success- 
ful. Follow-up operations conducted by the 37 district 
offices in conventional areas were similar to the second 
stage of follow-up in mail areas. Chapter 6 describes 
the 1980 census follow-up experience in more detail. 

District off ices , on average, completed all follow-up 
operations about five to six months after Census Day in 
mail census areas and four to five months in conventional 
areas. For a small percentage of housing units (less 
than 0,5 percent of the total) from which questionnaires 
were not obtained by the end of follow-up, the district 
office "closed out" the case. For some of these units, 
the office knew the household size, but for others there 
was no knowledge of whether the unit was occupied. 



Coverage Improvement 

Coverage improvement is a term encompassing several ,i 

different approaches to the collection of information {; 
from households that were missed by the master address 

list, or from individuals within otherwise-enumerated ( 

households who were missed or elected not to respond. 

The various address checks carried out prior to Census \ 
Day were part of the coverage improvement effort for the 

X980 census. In addition, Census Bureau staff implemented / 
several post-Census Day coverage improvement programs 

[primarily in mail areas during the second stage of j 

follow-up: r 

. ** 

(a) Checks Based on Responses to Coverage Questions! -'^ 

Whole Household Usual Home Elsewhere Check. Enumerators | 

visited addresses at which a respondent in a small apart- } I 

roent building (less than 10 units) reported more housing ( 

anits than listed for the structure on the master address ! "; 

List, households that reported more residents on the / 

front of the questionnaire than on the inside pages (the J; 

Dependent roster check), households whose respondents 1 

.indicated some uncertainty about who was considered a ^J 

lousehold member, and households with persons listed as # 

laving their usual place of residence elsewhere to make ,| 

;ure all households and persons were properly counted. ;| 



94 

Whole households reporting usual residence elsewhe 
checked to be sure that the occupants were counted 
once at their usual residence. 

(b) Vacant/Delete Check. In 1980 a second inde] 
enumerator revisited every unit classified as vaca 
the first stage of follow-up (or at the time of em 
tion in conventional areas) to determine if the un 
actually been occupied on Census Day and also to tj 
identify and enumerate persons who moved into the i 
after Census Day who had not been enumerated at the 
former residence. The Census Bureau implemented tl 
check in response to findings from the 1970 census 
indicating that a nontrivial proportion of housing 
enumerated in the census as vacant was actually oc< 
In 1970, however , in contrast to the complete rech 
vacancy status carried out in 1980, Census Bureau i 
rechecked only a sample of units initially declare* 
vacant and used the results to carry out a compute] 
imputation for other units. 

(c) Nonhousehold Sources Program. For areas wil 
large minority populations, the Census Bureau dist; 
office staff clerically performed a cross-match bet 
census records and lists of names and addresses fr< 
outside sources, including driver's license lists, 
records of the Immigration and Naturalization Serv: 
and, in New York City, welfare records. Enumeratoi 
visited addresses at which persons were identified 
the match who might have been omitted from the cour 

(d) Prelist Recanvass. In mailout-mailback ar< 
which Census Bureau staff developed the address lis 
only two of the pre-Census Day address checks were 
formed, the field staff rechecked the list for comj 
ness during the second stage of follow-up. 

(e) Post-Enumeration Post Office Check (PEPOC) . 
all conventional areas, mail carriers noted address 
that did not appear to be on the address list? thes 
addresses were followed up by Census Bureau persom 
This program was previously implemented in the 197 ( 
census on a sample basis in rural areas of the Soul 

(f) Casual Count. In major urban areas, Census 
Bureau field personnel visited places where persons 
had no fixed address or who were missed at their 
residence might be found, such as skid row district 
pool halls, employment offices, etc. 

(g) Were You Counted. The Census Bureau had si 
forms printed in news media inviting persons who be 
that they were missed by the census to complete the 



95 

and send them in. The district office staff checked their 
records to see if persons sending in these forms were 
already included. 

(h) Local Review. The Census Bureau provided pre- 
liminary housing unit and population counts to local 
officials after completion of the first stage of follow- 
up. Officials reviewed the counts and indicated possible 
problem areas that were field checked and corrections 
made as needed. 

The coverage improvement efforts for the 1980 census 
represented a considerable expansion in number and scope 
over the 1970 effort. In addition to the procedures 
described above for identifying missed persons and house- 
holds r programs aimed at increasing public cooper ation, 
particularly among hard-to-count groups , were greatly 
expanded. The latter included special publicity efforts, 
assistance centers that the public could call or visit 
for help in filling out census forms , and the availability 
of foreign-language questionnaires. Chapter 5 describes 
the experience with coverage improvement in 1970 and 1980 
and provides program-by-program estimates of both cost 
and yield, or net additions to the count of population 
and housing units. 



Data Processing f 

','j 

The next step in the decennial census process was to take }] 

the raw data collected from the enumeration, follow-up, t 

and coverage improvement stages, create computerized * 

household and person records, and edit these data records > 
prior to producing and distributing the final census 
counts, cross-tabulations, and sample microdata files. 

The reader should note that no names or addresses were | 

retained on the computerized files, in 1980 computer ^ 

editing of the raw data involved four steps s (1) imputa- ? 

tion for unit nonresponse, (2) imputation for item non- I 

response (see Appendix 3.1 for definition and description # 

of the sequential hot-deck imputation method used) , (3) >l|, 

weighting the records containing long-form data collected f 

from about 20 percent of the households by iterative | 

proportional fitting (see Appendix 3.2 for definition and /" 

description), and (4) implementing various suppression 4 

routines on the cross-tabulations and sample microdata | 

records to protect the confidentiality of individual ^ 

respondents' answers. In addition, clerks manually coded If 

. I ' 



96 

handwritten responses to long- form questions on oc 
tion, industry, place of work, and other items, a 
that preceded computer processing of the long-form 
information. 

As mentioned above, for less than 0*5 percent o 
addresses, census enumerators were not able to obt 
even last-resort information. For these close-out 
the Census Bureau, where necessary, first imputed 
occupancy status of the unit and then, for units d 
nated as occupied, "substituted," that is, imputed 
sequential hot-deck imputation (see Appendix 3.1) 
filled-out questionnaire from a randomly selected 
neighbor. Of the total population count in 1980, 
percent represents persons who were imputed in thi 
manner (Bounpane, 1983s31). In addition, the Cens 
Bureau made substitutions for persons and housing 
for which the last-resort information obtained in 
second stage of follow-up was inadequate and, in a 
few instances (0.1 percent of the total), for whic 
questionnaire was inadvertently damaged during pro< 
The total of substituted persons including close-o 
last-resort cases was 1.5 percent of the final 198 
population count (Bureau of the Census, 1983d :Tabl 

For questionnaires with missing or inconsistent 
information, computer programs allocated or assign 
again using sequential hot-deck imputation, the rei 
of a geographically nearby respondent with similar 
characteristics as determined from the completed p 
of the census questionnaire. Some consistency edi 
not require hot-deck imputation but were made on tl 
basis of other information within the same data re 
About 10 percent of the total households in 1980 hi 
or more short- form items imputed, and almost 45 pe 
of people receiving the long-form questionnaire ha< 
or more items imputed (Bureau of the Census, 1983d 
B-4; 1983f:Table C-l; see also Citro, 1984). 

As mentioned before, about 80 percent of househ 
received the short form, while a sample of about 2( 
cent received the long form. (The sample was sele< 
systematically rather than randomly, that is, ever: 
address, or, in places under 2,500 population, eve 
other address, was designated for the sample.) Bol 
types of forms included a common set of basic demo< 
and housing questions. Publications and data tape! 
containing just the short-form items were produced 
the entire set of census records (complete count) , 
data products that cross-tabulated these items witl 



97 

other items asked on the long form were produced in a 
second pass of only the sample records, without adjusting 
sample weights , the marginal tabulations of basic charac- 
teristics contained in the complete count and sample data 
products would agree only within bounds of sampling error . 
The Census Bureau forced these marginals to agree closely 
through reweighting the sampled cases using a technique 
called iterative proportional fitting (see Appendix 3.2). 
Forcing agreement promoted consistency in the census 
tabulations , reduced the variance of the estimates , and 
also probably reduced any biases that may have occurred 
in the sample selection. 

Finally, prior to release of tabulations and data files 
to the public , the Census Bureau implemented computer 
programs to suppress information that might permit iden- 
tification of individual respondents. For example , char- 
acteristics of minority populations in areas that had 
fewer than 15 such persons were not released (Bureau of 
the Census , 1982bsl03-106) . 



Post-census Evaluation 

The Census Bureau implemented a variety of programs to 
attempt to evaluate the quality of the census in 1980. 
Programs to evaluate the completeness of census coverage 
of the population, i.e., the completeness of the count, 
included the Post-Enumeration Program (PEP) and demo- 
graphic analysis. Other programs evaluated the quality 
of responses for particular content items. Chapter 4 
describes coverage evaluation programs carried out in 
1980 and prior censuses. 

The reader should not gain the impression from the 
above description that every step in the decennial census 
process flowed smoothly or was conducted exactly as 
planned. Each stage experienced problems, some of design 
and some of implementation. A major goal of the research 
and testing program that has begun for the 1990 census is 
to identify modifications to census methodology that 
promise to facilitate the census process and enhance the 
quality of census data. Before turning to a review of 
the Census Bureau's current research plans, the two 
sections that follow briefly review the highlights of 
methodologies used in previous U.S. censuses and in other 
Western nations to indicate the range of possibilities. 



98 
METHODOLOGY USED IN PREVIOUS CENSUSES 

It is natural to begin this discussion with the 1950 
census , which was the first population census in the 
United States to have comprehensive programs for 
evaluation of completeness of coverage. The 1950 ceri 
(see Bureau of the Census, 1955) relied exclusively c 
personal enumeration to obtain responses to census 
questions. Enumerators went door-to-door with sheets 
(line schedules) that had room to list 30 persons on 
front one person to a line and up to 12 housing uni 
on the back. Enumerators asked every fifth person ar 
additional set of questions and every thirtieth perse 
few more questions , generating sampling rates of 100 
percent, 20 percent, and 3.3 percent. There was no p 
compilation of an address list, although the Census 
Bureau estimated total housing unit counts by block f 
most cities of 50,000 or more population to use as a 
check on the completeness of the enumeration. On an 
experimental basis in 1950, the Census Bureau tested 
use of a list/leave self-enumeration procedure whereh 
enumerators listed addresses and left questionnaires 
households to fill in and mail back to census distric 
offices. The Census Bureau also tested the use of 
household instead of line schedules. 

The 1960 census (see Bureau of the Census, 1966) u 
a combination of mail and personal interview enumerat 
techniques. In areas covering roughly 82 percent of 
population, enumeration involved a two-stage list/lea 
procedure. In these areas, several days before April 
the Postal Service dropped off household questionnair 
called advance census reports (ACRs) that contained t 
100 percent items. Residents were asked to fill in t 
answers and wait to give the ACR to an enumerator. 
Enumerators came to all households and transcribed th 
100 percent items to computer -readable forms. If the 
household had not answered the questions, the enumera 
obtained answers at that time. The best estimate is 
60 percent of households had the forms filled out and 
waiting before the enumerator arrived. At every four 
household, the enumerator left a long- form questionna 
which the household was to fill in and mail back to c 
district offices. A different set of enumerators fol 
up for sample questionnaires that were not returned 
about 20 percent of the sample and for vacant units 
the sample. In the remaining areas of the country co' 
ing about 18 percent of the population, the enumerati 



99 

involved a single-stage approach. The Postal Service 
delivered unaddressed questionnaires. Enumerators visited 
each household and obtained answers to the 100 percent 
items and also to the sample items for designated house- 
holds. 

The 1970 census (see Bureau of the Census, 1976a) 
foreshadowed in most respects the methodology adopted for 
1980. This census extended the use of the mails in con- 
ducting the enumeration. In areas of the country encom- 
passing roughly 60 percent of the population 9 the Postal 
Service delivered questionnaires to all households several 
days prior to Census Day with instructions to the resi- 
dents to complete and mail back the forms. Four-fifths 
of the households received the short- form questionnaire , 
while the other fifth received one of two versions of the 
long form (one version was sent to 15 percent and the 
other was sent to 5 percent of households) . In the 
remaining areas of the country covering roughly 40 per- 
cent of the population , the Census Bureau used conven- 
tional enumeration procedures similar to the single-stage 
procedure used in rural areas in 1960. In one change 
from 1960 , the unaddressed short forms sent to households 
in the conventional areas were already in computer- 
readable format. On an experimental basis in 1970, the 
Census Bureau tested use of mailout-mailback procedures 
in selected areas that would otherwise have been enumer- 
ated conventionally. The success of this experiment led 
to the decision to expand the mailout-mailback procedure 
to over 95 percent of households in 1980. 

The 1970 census was the first to implement specific 
programs designed to improve coverage, including both 
checks of the master address list prior to Census Day and 
programs, such as a recheck of units classified as vacant, 
conducted after the first stage of follow-up. For 1980 
the Census Bureau greatly expanded the number and scope 
of coverage improvement programs. Unlike 1970, for which 
two of the programs the National Vacancy Check and the 
Post-Enumeration Post Office Check were implemented on a 
sample basis, an early decision was made to carry out all 
coverage improvement programs on a 100 percent basis in 
1980. 



METHODOLOGY USED IN OTHER WESTERN COUNTRIES 

There is a wide range of methodologies used to carry out 
periodic censuses in other Western nations. The 






100 
highlights 



and Great Britain France t h' 
Germany, and the Netherlands % J** 1 Re ^ b ^ o 
the last six countries th6 H. en 9nd Denmark - 
Redfern, 1983.) obviously S" "" 10 " d ws heav: 
one country may J ^^* S that -r k , 
reasons, such as different M* another f or 
tudes or Differences inlopura t ^ n PerCePUOnS and 
scale of census operation! M ?K "* 3nd conse <I 
reviewing census methodology S ^S f'n"?'/* 1- 
^ lted St 



-vir-;' eve - - 

questionnaires, while at th* Enumerators c 
address regis^r. Several davf? ^ me **U*9 ar 
each household to pick U o .h ' enumer ators 

' 



o pck U o .h 

Field operations gSS^ScT'* 1 !*"* questio nai 
Census Day. Australia u ses i^ Wlthin tW wee 
niques to evaluate the ^^^'^ Su 
based on the results, produol ^ he COUnt 






householders to fin them in I * instruc tions for 
80 percent of the EuSSig'".? the u nited B ^ 
percent a long form in I? i receive a sh rt form and 

tion concentration? householder 38 f Significant PP 
their census forms while in f are asked to -ail ba, 
enumerators call back to p ick SZ * P PUlated a "a, 
areas, enumerators follow U p for 1^' ^. b th types 
response. Enumerators are held ent, f item nO "- 
conducting a complete and accur^f Y res P nsi ble f 
districts the same L^ accurate census in their 
initial list/leave and f Si " ^^ "" P6rf rjns the 
ators- work is SiS * to T? Perations - ">r- 
work ends in about three weeks 7 C ntrO1 - M St field 



e cnue 

recruited, trained and o^V Enuraerato , who are 
rather than iS 
in .ost countries 



101 

retrieve questionnaires in their areas. The enumerators 
for the most recent 1981 census completed field operations 
within about two months of Census Day. The questionnaire 
in 1981 included relatively few items 16 questions for 
each person and five questions on housing and cars. No 
questions were asked on income, ethnicity, marital his- 
tory, or childbearing history. Sampling was not used in 
the field, but responses to questions that required 
manual coding, such as occupation, were processed on a 10 
percent sample basis. / 

The recent 1982 census in France used conventional 
enumeration techniques as well. The enumerator staff, 
who were recruited and supervised by the local administra- 
tions, collected data using three main forms? one for 
each individual, for each housing unit, and for each 
building. Questions asked of each individual were 
relatively few in number compared with the United States? 
for example, questions were not asked on ethnicity, i 

language, income, or journey to work. There was a 
relatively large number of questions on housing. Most 
items, although obtained from 100 percent of the popula- 
tion, were processed on a 25 percent sample basis. The 
French are considering a system of mailout-mailback of a 
short form to every three in four households and using 
enumerators to obtain responses to a long form at the 
remaining one in four addresses. 

The Federal Republic of Germany last conducted a 
census of buildings and houses in 1968 and a census of 
population in 1970 with conventional enumeration tech- 
niques. The 1970 census used two forms: a long form 
administered in 10 percent of the enumeration districts 
and a short form administered in the remaining 90 percent ; 

of districts. The local communities played a major role "'i 

in the field work, recruiting and training enumerators, 
checking the census returns against the local population " 

registers, and correcting one or the other set as neces- /, 

sary. The federal government planned a combined popula- $ J 

tion and housing census for 1983, with a single form fl 

containing a shorter list of questions than the 1970 long 
Eorm. However, public opposition to the census forced ,| 

the government to postpone it indefinitely. The opposi- | 

-ion stemmed from considerations of privacy and confiden- \| 

biality and specifically objection to the practice in the | 

.961 and 1970 censuses of using individually identifiable | 

census information to correct the local population regis- W 

:ers (see Butz, 1984, for a description of the 
:ontroversy) . 



102 

The Netherlands most recently carried out a census 
population in 1971 administered by the municipalities 
which generated an initial address list from the loca 
population registers, recruited, trained, and paid th 
enumerators, and used the census returns to update th< 
registers. The census operations were completed and ^ 
published, but about 2*3 percent of the population fa; 
to cooperate as a consequence of public debate about 
computers and privacy. The 1971 census had a separate 
form for each person, and the questionnaire for heads 
households included about 60 items. The plans for the 
1981 census specified important design changes, indue 
(1) abandoning the practice of using census returns tc 
update the local registers (on the basis of results fr 
the 1971 census showing the registers to be very com- 
plete) ; (2) obtaining demographic information from the 
registers; and (3) administering a short form to four 
five addresses asking solely for the number of housing 
units, households, and residents in each household, an 
long form to the remaining 20 percent of addresses 
similar in length to the 1971 questionnaire. However, 
public concern about confidentiality of the data and 
disappointing response to a pretest in 1979 led the 
Central Commission on Statistics to recommend that the 
1981 census be cancelled. In its place, the commissio 
acted to increase the size of the Labor Force Survey f 
about 2.5 to 5 percent in spring 1981, to carry out a 
percent housing survey in fall 1981, and to obtain bas 
demographic information from the population registers. 

Sweden currently conducts a quinquennial census usi 
mailout-mailback techniques. For the most recent cens 
in 1980, forms were mailed to each person age 16 or ov 
and to each married couple with names and personal ref 
ence numbers preprinted from the local population regi 
ters. The form asked only for a list of adults perma- 
nently living in the home and for details of the perso 
labor force activity. The mail return rate in both 19' 
and 1980 was about 97 percent. The statistics office 
linked the returns to the population registers to obta 
demographic data including age, sex, marital status, a 
citizenship and obtained data on housing from returns 
made by owners of real estate for tax assessment purpo 
The Swedish government is actively pursuing the concep 
of a completely register -based census but is encounter 
considerable public concern. 

Denmark has been the pioneer of a census based com- 
pletely on administrative registers rather than enumer< 



103 

tion. Denmark instituted local population registers 
beginning in 1924 and in 1968 created an automated central 
population register with a unique reference number for 
each person. The 1970 census in Denmark was the last 
conducted using enumeration techniques. In 1976 Denmark 
used the central population register to obtain a set of 
demographic data for all persons for statistical purposes. 
In 1977, the national government created a central regis- 
ter of buildings and dwellings/ based on declarations 
made by property owners for tax assessment purposes, and 
made various other improvements in relevant administrative 
records systems. The government used the following regis- 
ters to carry out a completely register-based census in 
1981: 

Central population register; 

Central register of buildings and dwellings? 

Registers of wages and salaries paid to each 
employee as reported by employers to the tax 
office? 

Registers of income as returned by individuals to 
the tax office? 

Registers of employment insurance and unemployment 
benefits? 

Central register of enterprises and 
establishments? 

Register of educational achievements? and 

Geographic address coding files. 

Problems posed by this census methodology in Denmark 
are numerous: (1) the central population register is 
generally believed to be of high quality but contains 
records for persons who have emigrated? (2) some data 
items, such as means of travel to work, are not available? 
(3) other items, notably occupation, have serious 
reporting problems? and (4) there have been delays in 
obtaining data from some registers, notably the tax 
office records. Chief arguments made in its favor, 
compared with traditional enumeration techniques, is that 
costs and burden on the public are greatly reduced and 
that data are available for items, such as income, that 
were never included in conventional census questionnaires. 
There has been very little public objection in Denmark to 
the large-scale linkage of records involved in a register- 
based census. No evaluation information exists on the 
completeness and accuracy of the 1981 Denmark census. 



104 



Government and academic statisticians in tt 
States have suggested modifications in this n< 
census methodology that would incorporate cone 
procedures used elsewhere. The Census Bureau, 
viously described , tested use of a variant of 
list/leave/mailback technique in the 1980 cem 
list/leave experiment. More extensive use of 
trative records has been proposed for purposes 
from address list construction to improvement 
and selected content items (see Brown, 1984) . 
Scheuren (1982) have advocated research on the 
an administrative records census and developed 
liminary assessment of the coverage and subjec 
that could be expected from existing administr 
records systems, such as Internal Revenue Serv 
Social Security Administration records. Other 
modifications to census methodology in this co 
include the use of sampling for obtaining the 
adjustment of field counts for incompleteness < 
The next section describes the Census Bureau's 
plans for research and testing directed to the 
methodology for the 1990 census. 



CENSUS BUREAU RESEARCH PLANS FOR 1990 

The Census Bureau staff has been actively work 
1983 to design and implement a research and tes 
program for the 1990 census. The staff prepare 
research plans in late 1983 and early 1984 on t 
lowing topics, each of which relates to an are< 
interest to the Panel on Decennial Census Methc 
(the most recent version is cited in each case] 

"Uses of Sampling for the Census Count" ( 
et al., 1984) proposes research on several appJ 
of sampling for obtaining the count, including: 
the census with a large sample survey, followir 
a sample of households that fail to mail back t 
questionnaires, and implementing coverage imprc 
content verification programs on a sample basis 

"Research Plan on Adjustment" (Hogan, 198 
describes an ambitious and wide-ranging researc 
directed toward improvement in methods for eval 
census coverage and development of methods for 
of census counts and investigation of their imp 
tor census data uses and users. 



105 

"Record Linkage Research Plan" (Jaro, 1984a) dis- 
cusses plans to develop automated procedures for matching 
records for use in coverage evaluation programs and other 
aspects of census methodology. This research plan is 
directed toward a critical problem area for most methods 
of coverage evaluation determining in an accurate and 
timely manner which persons captured in an independent 
survey or set of administrative records were or were not 
enumerated in the census. 

"Research Plan on the Uses of Administrative 
Records for the 1990 Census" (Brown, 1984) discusses 
possible uses of administrative records for coverage and 
content improvement and evaluation, content collection, 
special place enumeration, and as a replacement for the 
census. 

"Residence Rules for the 1990 Decennial Census" 
(Herriot and Speaker, 1984) reviews the rules of resi- 
dence that are used in the census to determine who should 
be counted and to assign persons to geographic areas. 

The first field activities directed toward the 1990 
census involved tests of alternative methods of compiling 
address lists in urban and rural areas that were conducted 
in several localities in spring 1984 (Bureau of the 
Census, 1984b) Concurrently, the Census Bureau staff 
developed specific plans for the first full-scale pretests 
to be carried out in spring 1985 (Bureau of the Census, 
1984b). In this round of tests, the Census Bureau 
experimented with various automated procedures to improve 
census operations in Tampa, Florida. In addition to 
testing use of these automated procedures in a second 
location, Jersey City, New Jersey, the Census Bureau 
conducted a test of a two-stage census operation in the 
latter city. 

The two-stage procedure involved collecting only 
short-form information from all housing units in the 
first stage and later contacting a sample of housing 
units during an administratively separate second stage 
for long-form information. Households in the second- 
stage sample were asked to respond to all the short-form 
items once again, in contrast to the procedure used in 
1960, wherein respondents were asked to repeat only name 
and relationship for each household member. 

In conjunction with the Tampa pretest, the Census 
Bureau is conducting a post-enumeration survey of a sample 
of blocks as part of its research and testing program on 
coverage evaluation methods. The test will include an 



106 

administrative records match for two typically hard- 
count groups, minority males ages 18-40 and minority 
children under age 10* 

In summer and fall 1984, Census Bureau staff begc 
develop goals for a much more extensive pretest prog 
to be carried out in spring 1986 (see Matchett, 198^ 
Johnson, 1984). The 1986 pretest objectives incorpc 
some of the ideas outlined in the research plans cit 
earlier and omitted others. Subsequently, some chan 
were made to the pretest plans (see Bureau of the Cc 
1985b) , but most of the objectives initially identif 
were retained. The process for planning the 1990 ce 
pretests is actively ongoing; the description that f 
summarizes the main features of the pretest objectiv 
for 1986 as they were defined in spring 1985. 

Effort is to be directed in 1986 toward tests of 
specific methods and procedures in the following are 

* Feasibility of Adjustment-Related Operations. 
This area includes tests of coverage evaluation base 
pre-enumeration and post-enumeration surveys of samp 
blocks in an urban test site and a post-enumeration 
in a rural site. The plans include using the result 
from the urban post-enumeration survey to simulate a 
operational aspects of carrying out a full-scale adj 
ment of the census figures for the urban site by the 
of 1986. (See Chapters 7 and 8 for additional 
description.) 

* Automation, The Census Bureau proposes to tes 
two major processing alternatives: (1) a system of 
separate collection and processing offices for use i 
urban areas and (2) a system of local offices that 
combine collection and processing for possible use i 
rural and selected suburban areas. The urban test w 
include experiments with different data entry techni< 
In all instances, the intent is to develop automated 
processing systems that provide greater management c< 
of the questionnaires and of the address list and th 
permit entry of responses into computer -readable fori 
a flow basis. In contrast, the 1980 census local di 
offices relied exclusively on clerical staff to manu< 
check-in and review questionnaires, update the addre 
list, and perform other operations. Questionnaires i 
sent in batches to one of three centers for data ent 
and computer processing. 

* Native- American Enumeration Techniques and 
Procedures. The Census Bureau proposes to test vari 



107 

methods to improve coverage and accuracy of enumeration 
on American Indian reservations, including obtaining 
tribal rolls and designating tribal liaisons, modifying 
the training procedures for indigenous enumerators, and 
advancing travel expenses for enumerators. 

* Rural Area Techniques and Procedures, This area 
includes testing alternative methods of improving ques- 
tionnaire delivery and coverage in rural areas that were 
conventionally enumerated in 1980 and also in prelist 
areas for which Census Bureau staff developed the mailing 
list in 1980 rather than working with commercial lists. 
(See Chapter 5 for further description.) 

* Coverage Improvement. This area includes tests 

to improve the effectiveness of at least a dozen coverage 
improvement procedures that were used in 1980, such as 
address checks, a program to recheck the status of units 
originally classified as vacant, a program to check 
administrative records against census returns to identify 
possibly uncounted persons, local review of preliminary 
census counts, and others (see Chapter 5). 

* Enumeration Methods for Multiunit Structures With 
Mail Delivery Problems. Procedures proposed for testing 
include refining the various checks that are conducted of 
the mailing list to identify likely problem addresses 
(for example, buildings with a central mail drop) and to 
use an update list/leave procedure for multiunit struc- 
tures with delivery problems, for which census enumer- 
ators rather than Postal Service staff deliver the 
questionnaires and update the mailing list at the same 
time. (See Chapter 5 for further description* The 
update list/leave test may be deferred until 1987.) 

* Follow-up Procedures. Included in this area are 
proposed tests to use telephone follow-up for households 
that do not mail back questionnaires and to use computer- 
assisted telephone interviewing of households whose 
questionnaires fail one or more edits. (A proposal 
originally included to test the use of sampling for 
follow-up of households that do not mail back question- 
naires was dropped? see Chapter 6.) 

* Geographic Support System. Various tests are 
proposed of aspects of the geographic support system, 
including the address control files, maps, and geocoding 
files that assign addresses to pieces of census geography. 

* Outreach. The Census Bureau proposes to test a 
number of ideas for improved outreach and advertising for 
the decennial census. 



108 

Questionnaire Design and content. This area > 
like coverage improvement, includes a large number 
ideas and procedures for testing, such as: a gerier 
purpose follow-on survey of short form households a 
two months after Census Day; alternative race anci 
ethnicity questions; questions about noncash income 
questions about second residences to help minimize I 
overcounting and under counting; the use of a structn 
questionnaire to ask some housing items of a knowlec 
respondent, such as the building manager instead of. 
household. (See further description in Chapters 5 a 

* Tabulation and Publication Systems. This ares 
includes tests of procedures to improve processing c 
short-form tabulations that are produced for local r 
and for redistricting use by the states. 

* Work Force Issues. This area includes tests o 
ways to improve the selection, retention, and produc 
of enumerators, for example, using teams of enumerate 
in hard-to-count areas. 

The concepts and procedures proposed for testing 
listed under each of the above headings represent tho 
that remained after a prior selection process. Moreo 
although the staff originally assigned the objectives 
three priority categories, top Census Bureau planners 
have indicated that their intent is to request fun?i7< 
for all the 1986 pretest objectives. The^rTtLnaS ! 
that there are few opportunities to test new or i 
census procedures and hence that the Census Bureau 



ideas as 
include efforts, 



^^^^-^-^^'^ 












Position 1 ^" urtatl 4 !^* Census * B "reau prepared a 

and Proposing a Secmc'rL" 36 ^ 011 Plan n ad ^^en t: 
peciric research program for coverage 



109 

evaluation in 1990 and possible adjustment of the census 
counts (Wolter, 1984). In brief , this paper described 
the Census Bureau's plans to develop and test a design 
for a post-enumeration or possibly pre-enumeration sample 
survey to use as the major coverage evaluation program 
providing information that could be used for adjustment. 
The paper indicates that the currently preferred design 
is for an independent survey, instead of an existing data 
collection vehicle such as the Current Population Survey 
and for a compact area cluster sample as opposed to list 
sample, that is, a sample including all residences within 
selected small geographic areas , such as city blocks. 
The Census Bureau explicitly ruled out using the reverse 
record check methodology or administrative list matching, 
except possibly as an adjunct to the independent survey. 

The paper also described plans to design and test 
operational procedures that could be used to adjust the 
census results. The paper stated that the Census Bureau s s 
goals are to develop procedures that, if successful, 
would permit adjustment of all census figures, including 
the population count and characteristics, in time for 
delivery of adjusted state population counts to the 
President by December 31, 1990, and in a manner such that 
the individual micro records could be aggregated in any 
possible way for tabulations and analysis. The paper 
acknowledged that development of satisfactory coverage 
evaluation and adjustment procedures would require many 
important improvements in methodology, including success- 
ful implementation of a fast and accurate computer 
matching program. (See further discussion in Chapters 7 
and 8.) 



ASSESSMENT AND GENERAL RECOMMENDATIONS 

The panel believes it can contribute to the choice of 
methodology for the 1990 census by providing a careful 
critique of the Census Bureau's research and testing 
plans. How well the Census Bureau designs its research 
and testing program will crucially affect its success in 
improving accuracy and timeliness of the 1990 census 
while containing costs. 



Review of 1985 Pretest Plans j; 

if, 

; ijj 

The panel's interim report, which was prepared to provide ff 

early guidance to the Census Bureau regarding proposed V; 

/ ?', 
,1 > 

'W, 

* 
* 



110 

research and. pretest plans, commented extensively on 
several aspects of the 1985 pretest plans, particularly 
the two-stage census pretest in Jersey City. The panel 
on balance, did not support this methodology and recom- 
mended that research be carried out based on prior cen- 
suses before reaching a decision to commit resources to 
field test (see National Research Council, 1984:Ch.3) . 
The Census Bureau field staff suggested that a two-stagi 
procedure would make it possible to speed collection of 
the count in the first stage and thereby significantly 
improve the timeliness of the basic information. The 
Census Bureau believed it was important to obtain an 
early determination of the likely gains in timeliness 
from a two-stage procedure and, hence, proceeded with tl 
test as planned. 

The panel did not scrutinize plans for the Tampa 
pretest of automation procedures because the panel is nc 
specifically addressing operational aspects of the 
decennial census relating to field control of the addres 
list, data entry, and so on. However, the panel supper 
efforts by the Census Bureau to develop improved automat 
procedures that have the potential to speed up data col- 
lection, improve accuracy, and reduce costs. The panel 
also supports efforts to automate matching operations 
that may be used in coverage evaluation and coverage 
improvement programs. 

The panel commented in the interim report on the 
coverage evaluation tests being conducted in Tampa and c 
other research in progress related to coverage evaluatit 
and adjustment. Chapters 7 and 8 of this report comment 
further. 

Finally, the panel recommended in the interim report 
that a question asking parents for names and addresses c 
children not residing in the household receive early 
testing as a coverage improvement measure (National 
Research Council, 1984:24). At present such a question 
is being considered for testing in 1987 (see further 
discussion in Chapter 5) . 



Review of 1986 Pretest Plans 

For this report the panel reviewed the Census Bureau's 
descriptions of proposed 1986 pretests and the proposed 
coverage evaluation and adjustment research program, 
along with the research plans listed earlier and other 
documents. We provide below an overall assessment of tl 



Ill 

Ce 3us Bureau's 1990 research and testing planning 
pr cess and recommend strategies for choosing priority 
pr jects. Subsequent chapters present recommendations on 
pr test and research plans in specific areas. 

Che panel has several major concerns with the research 
an testing program outlined for 1986. These concerns 
re ite to the time schedule for planning the 1990 census , 
bu jet and staff resources, and the emphasis given to 
fi< Ld testing over other kinds of research. 

Che panel has noted elsewhere that there is not much 
til 5 to get ready for 1990. On the face of it, this 
re Lity may appear to argue for the need to test as many 
id< is as possible as early as possible. On the contrary ? 
ho aver, the panel suggests that it is likely to be self- 
de: sating to try to handle a very large and many-faceted 
te bing program. To be useful for making timely decisions 
on :ensus methodology, test data must be obtained, 
an Lyzed, assessed, and discussed and the findings used 
to lesign subsequent tests. This process is itself 
ti 2-consuming and requires ample staff and other 
res mrces (such as computer resources) . If too many 
st lies are planned for a testing cycle, there is a 
dai [er that there will not be sufficient time to obtain 
an assimilate results from more than a fraction of the 
te! :s for use in planning further studies or in making 
ch< Lees of methodology to use for the census. 

loreover, field tests are very resource-intensive, and 
bu< jet resources and staff time devoted to designing and 
imj .ementing a wide range of pretests are likely to take 
aw. T from budget resources and staff time available to 
obi lin and digest the pretest results. Even though ample 
fu] Is may have been allowed for the analysis phase, these 
fur Is are typically more at risk of diminution than the 
fu] Is for the actual tests themselves, if the costs of 
tes .ing exceed estimates, as frequently happens, the most 
lil ly outcome is a reduction in budget available for 
ana ysis. 

'he panel believes that the Census Bureau should give 
grc ter recognition to the problems involved in a large- 
sc< .e testing program posed by the constraints of 
ca] ndar and staff time needed to evaluate and assimilate 
th< results. We believe that the Census Bureau will need 
to are back its 1986 testing program if key data are to 
be Jialyzed in time to support major decisions. The 
prc ram outlined appears too ambitious for the time 
rer lining before the census and for the staff resources 
lik ly to be available. 



112 

The Census Bureau should exercise greater select! ity 
in several ways. First, the planning staff should 
carefully review all the proposed pretests to determ ne 
if some ideas should be dropped from the research ar 
testing program entirely* We recommend a strategy o 
identifying the more promising projects and pursuing only 
those projects from the top of the list that fit the 
overall time and resource constraints , even though t is 
entails the risk that useful ideas will be ignored. Some 
ideas that cannot be accommodated in the 1990 resear h 
program should be considered for testing on an exper - 
mental basis in the 1990 census itself with a view t ward 
further improvements in methodology for the year 200 . 

Second, Census Bureau staff should determine if t ere 
are useful ideas that can be pursued without requiri g 
the time and expense of full-scale pretests. There re a 
number of projects listed in the Census Bureau's pre *st 
package that we believe can be researched with much ess 
expense and effort via other methods, such as thorou h 
review of the Census Bureau's own previous tests and 
research. The panel suggests elsewhere in the repor 
projects for which the Census Bureau could usefully arry 
out research in 1986 that does not involve field tes 3 of 
the kind planned for 1985 and 1986. One example is 
investigation of the feasibility of using administra Lve 
records to obtain improved housing structure data (s 5 
the discussion in Chapter 6) . 

Moreover, research other than field tests carried Dut 
in 1986 could be very helpful for designing pretests :or 
1987. For example, research on new questions or alt :- 
native question wording could be carried out initial; ? by 
means of focus groups and laboratory experiments, in 
addition to the National Content Test (a large mail 
survey) planned for 1986. The 1986 field tests shou 1 
include tests of questions related to coverage impro^ i- 
ment (see the discussion in Chapter 5) but could wel 
omit other question tests in order to simplify the 
logistical problems and costs of fielding the tests. 
Results from the National Content Test and small groi > 
research carried out in 1986 could suggest further 
question tests for the 1987 field program. 

Finally, there may be proposed tests of procedure; 
that do not need to be conducted until the 1988 dresi 
rehearsals. For example, one proposed project is to :est 
automated searching and updating for persons found ii the 
Casual Count operation. This operation was low in c< it 
in both 1970 and 1980 but also low in yield in terms >f 



113 

number of persons added to the count. Assuming it is 
worthwhile to continue the program, it does not appear 
that the program merits extensive testing. It could be 
omitted from 1986 and 1987 tests and incorporated into 
the dress rehear sals 9 which will include every procedure 
planned for 1990 j an advantage of this approach is that 
by 1988 the Census Bureau should have made a decision on 
the type of automation system that it will use in the 
field. 

Recommendation 3.1. We recommend, to ensure 
cost-effective field testing and preservation of 
adequate resources for analysis, that the Census 
Bureau attempt to identify research and testing 
proposals for 1986 thats 

(a) Can be pursued with other research methods and 
omitted from the 1986 field test program? 

(b) Can be safely deferred for research or testing 
until 1987 or until the dress rehear sals j 

(e) Are unlikely to be viable for 1990 but should be 

incorporated on an experimental basis into the 

1990 census as a test for future censuses; and 
(d) Should be omitted entirely from consideration for 

the 1990 census, based on previous census 

experience or other survey research results. 

In Chapters 5 through 8 we comment on the Census 
Bureau's proposed research and testing program in specific 
key areas of census methodology related to the panel's 
charge, including: coverage improvement methods (Chapter 
5) , uses of sampling and administrative records (Chapter 
6), adjustment methods (Chapter 7), and coverage evalu- 
ation methods (Chapter 8) . The reader should note that, 
given the particular nature of its charge and its exper- 
tise, the panel did not undertake to review many other 
important aspects of census methodology, such as enumera- 
tion procedures, geographic support systems, and data 
entry procedures. 

Chapters 5 through 8 provide specific recommendations 
of ideas and procedures that the panel regards as high 
priority for research and testing as soon as possible, as 
well as ideas that the panel believes can safely be given 
a lower priority or show little promise and should be 
dropped. The panel's recommendations generally indicate 
a preference for the use of less resource- in tensive | 

research methods whenever possible and appropriate. i| 



114 

The panel s s recommendations in many instances call for 
the Census Bureau to complete studies or reanalyze data 
that are already available from the 1980 census and the 
experiments and pretests conducted for 1980. 

In general , the panel believes that research with 
existing data is likely to result in important addition 
to knowledge with low expenditure of costs compared wit 
other methods. Obviously , more expensive methods, 
including full-scale field tests, are required to deveL 
the methodology for 1990 , but the research and testing 
program should provide resources to exploit existing da 
as well. 

Recommendation 3.2. We recommend that the Census 
Bureau make full use of data from the 1980 census an 
from experiments carried out in 1980 to help guide 
planning for 1990. To this end, we recommend that tl 
Census Bureau assign a high priority to completion o 
1980 census methodological studies, and we encourage 
further analysis of these data when appropriate. 



APPENDIX 3.1 
AN OVERVIEW OF SEQUENTIAL HOT-DECK IMPUTATION 

The Census Bureau makes use of an extremely sophisticated 
sequential hot-deck imputation to correct for item non- 
response in the decennial census. We briefly describe 
some of the features of this system. Due to its com- 
plexities , we do not attempt a complete description? see 
Bureau of the Census (1983e) for further information. 

The individual records are processed sequentially. 1 
At the start of this process, an imputation table exists 
that has initial values stored in it for use with various 
combinations of nonresponse. For example, when, at the 
early stages of this process, a record is encountered 
with age and sex missing but race, etc., responded to, 
this table will have an entry that will give reasonable 
values for the age and sex of an individual with similar 
characteristics. However, as more complete (or at least 
more complete than the nonresponse represented by entries 
in the imputation table) questionnaires are processed, 
substitute values are continually used to replace the 
values in the imputation table. The benefit of this 
substitution arises from the geographic continuity 
implied by the processing of the census questionnaires. 
The closer the donor respondent is to the nonrespondent 
in the census processing, the closer the two are likely 
to be geographically. This procedure amounts to the use 
of detailed geographic stratification for imputation 
purposes. 

Although the above description gives the fundamentals 
of the sequential hot-deck procedure used by the Census 
Bureau in imputing for the decennial census data set, 
there are several further complications, two of which we 
touch on here. Both of these complications relate to the 
difficult problem of using an imputation mechanism that 
will produce a "consistent" data set when the process is 
finished. These two examples may give some idea of the 
magnitude of the problems encountered in devising an 
imputation procedure for the decennial census data set. 



x The short- form records and long-form records of the 
decennial census are treated separately. However, the 
differences between the imputations for the two forms are 
only of degree and not of kind. 

115 



116 

First, it is important to understand that the order of 
imputation of variables is key, since there is a strong 
dependence between the answers given on the census ques- 
tionnaire* For example, consider the situation of yes-no 
questions followed by further responses if the answer to 
the previous question was yes. 

Second 9 consider the case of imputing age of spouse. A 
simple-minded suggestion would be to substitute the age 
of spouse for a similar respondent . Unfortunately, it is 
quite easy to impute the "existence 11 of situations that 
one would consider to be rather unlikely , such as spouses 
substantially older or younger than their mates. This is 
something that one might characterize as weak inconsis- 
tency. To avoid the above possibilities, the Census 
Bureau imputes so that the difference between the spouse's 
and his or her mate f s ages is substituted. (One might 
also consider substituting based on the ratio of their 
ages.) This lessens the problem of having spouses and 
mates of vastly different ages. However, it does not 
necessarily address the difficulty of spouses with 
siblings older than their mother or father. Single cases 
such as age of spouse, if identified, can be treated. 
However , these possibilities must be noticed so that the 
need for these additional features in the imputation 
process is appreciated. Fellegi and Holt (1976) provide 
a solution to the problem of consistency of imputations 
whether carried out one variable at a time or in a 
multiple mode. 

One of the major motivations to the use of hot-deck 
imputation is that imputation of averages and zeros, as 
well as other types of cold-deck methods, which are 
relatively effective as far as estimates of means and 
central tendencies of the data set are concerned, 
severely distort the remainder of the distribution, 
especially the variance, of the affected variables. This 
is because the values imputed are far less variable than 
the observed responses would have been. This is 
especially true of imputation of averages, whose use 
clearly results in a reduction of the estimate of the 
variance of any estimate based on the data set with 
imputations. Hot-deck imputation avoids this by imputing 
typical values from the raw data set, thereby attempting 
to mimic the variance of the hypothetical complete data 
set. 

A relatively recent advance, termed multiple imputation 
(Rubin, 1978), which represents an expansion of the simple 
imputation strategy, may often lead to more accurate 



117 

inference than single imputation* In his paper , Rubin 
demonstrates that at least in some simple situations, 
e.g* r estimating a mean from a simple random sample with 
some random nonresponse, this generalization of imputation 
gives rise to an unbiased estimate of the variance of the 
sample mean. 

Ideally, it is highly desirable to control the level 
of imputation (by achieving high levels of good response) 
so that imputation will be more for user convenience than 
to affect the estimated mean and variances of the 
variables concerned. 



APPENDIX 3.2 
A DESCRIPTION OF ITERATIVE PROPORTIONAL PITTING 

One component of the census process briefly described 
Chapter 3 is the data processing component, which incli 
a step to relate the long-form to the short-form infor: 
tion. The long-form information published by the Censi 
Bureau is acquired on a sample basis. However, for th 
subset of variables that also appear on the short form 
information is available for all respondents, in orde 
to promote consistency between the short-form and the 
long-form tabulations as well as to reduce variance an< 
any sample biases, the Census Bureau adjusts some of tl 
sample information so that the sample estimates agree 
with the 100 percent information. Iterative proper tior 
fitting has been used by the Census Bureau since 1970 1 
accomplish this. Iterative proportional fitting uses \ 
100 percent information at an aggregate level that is 
cross-tabulations for geographic "weighting" areas of 
broad categories of some of the short- form variables (j 
Bureau of the Census, 1983f) to weight the individual 
long-form records. 

In a more general context, the objective of iterate 
proportional fitting is to allocate population totals i 
aggregated groups down to individual records by weight: 
the individual records so that totals for individuals 
over the subgroups agree with population totals for th< 
aggregates. Therefore iterative proportional fitting t 
potential for carrying down information from coverage 
evaluation programs, which is necessarily collected at 
aggregate level, to the individual record level of the 
decennial census data set. This would ensure that the 
adjusted data set would be consistent, in a manner 
described in Chapter 7, and would also have smaller 
variances. Consider the two-way table given in Table 
3.1. The nj + and n+i in Table 3.1 are the sample 
totals (for example, from the long form). The nu^. and 
m+i are the row and column totals from the superior 
source (for example, the short form). The problem is t 
use the mj + and m+^, the row and column marginal 
totals, to adjust the elements of the table. 

Iterative proportional fitting was first proposed in 
Deming and Stephan (1940). The first step of this 
algorithm reweights the entries in column 1 by the fact 
m+^/n+i. Then, assuming the values throughout all 
columns in the table have been altered in this manner, 

lift 



119 

TABLE 3.1 Notation for Iterative Proportional Fitting for a Small Two- Way 
Table 





Demographic 
Group 1 


Demographic 
Group 2 


Demographic 
Group 3 


Sample 
Total 


Population 
Total 


Age group 1 
Age group 2 
Age group 3 


^21 

n^j 


n, 2 
n 22 


n 23 


n i + 

"3 + 


m + 


Sample total 


n+i 


n+2 


n +3 


n + + 




Population total 


m +l 


m +2 


m+ 3 




m + + 



new table of n^j is created and the same operation is 
performed by rows, etc. After each iteration, each 
individual cell has been assigned a weight that applies 
to each member of the cell. The iteration proceeds until 
convergence (see Fienberg, 1970) . The procedure can also 
be applied to multiway tables of more than two 
dimensions. Iterative proportional fitting will, in 
general, reduce the variance of the resulting single cell 
estimated totals. 

Because the adjustment factor uses row and column 
totals from the sample data in the denominator, a zero 
row or column total will clearly require a modification 
to allow the resulting estimates to be finite. It is 
common practice to combine adjacent rows or columns if 
one row or column total is zero or small. If there are 
many zero cells in the interior of the table, the rate of 
convergence may be adversely affected. 

Iterative proportional fitting provides the user with 
weights, which can be used to construct estimates of 
other characteristics. When iterative proportional 
fitting is used for the purpose of assigning weights, and 
not merely for adjusting tables of cross-classified 
counts, it is often called raking ratio estimation. 
Iterative proportional fitting is a generalization of 
synthetic estimation (described in Chapter 7) , which is 
used on one-way contingency tables. 



4 

Evaluating the Decennial Censu 
Past Experience 



Evaluation of the decennial census is an important 
element of the census process. Not only does it pro 1 
users with some understanding of the limitations of 
information provided , but also the Census Bureau use; 
results to help improve the census methodology for 
administration of the next census. This chapter des< 
the various methods that have been used in the Unite 
States to evaluate the completeness of coverage in 
decennial censuses and what is known about the stren 
and weaknesses of each method. It also provides inf< 
tion on comparable experience with coverage evaluati 
Canada. 

Errors in the census can be classified as coverag 
error or content error. Coverage errors are those tl 
affect the population count and include cases of omi 
from the census of housing units and persons as well 
cases of erroneous enumeration or inclusion. Omissi 
of persons can occur, among other reasons , because 
occupied housing units and hence all of their resid 
are inadvertently overlooked or are believed to be n< 
residential or vacant at the time of the census, bee 
individual members of a household are not reported b: 
household, because persons with more than one usual ; 
of residence, such as college students away from hom< 
persons with a vacation home, are not counted at eit 
address, and because some persons do not have usual 
places of residence as the term is commonly used. 
Erroneous enumerations also can occur for many reasoi 
for example, because persons who moved between Censu 
and field follow-up are enumerated at both locations 
because persons with more than one usual residence a 
enumerated more than once, because "out of scope" 
persons, such as those who were born or migrated 

120 



121 

to the United States after Census Day or were temporary 
visitors to the United States, are counted , because of 
fictitious questionnaires filled out by interviewers 
( "curbstoning") , and so on. Typically, questionnaires 
using information from neighbors, landlords, etc. ("last 
resort" and "close-out" cases) , rather than actual contact 
with residents, are not treated as coverage errors unless, 
upon checking in a coverage evaluation program, the 
information turns out to have been erroneous. 

Net coverage error is the difference between total 
(gross) omissions from the census and total (gross) 
erroneous inclusions. The main goals of coverage evalua- 
tion programs are to measure the net coverage error for 
the total population of the nation and, when possible, 
for important demographic subgroups and subnational 
geographic areas. 

Content error includes errors in reported characteris- 
tics such as age and income. Estimates of net coverage 
error for particular population groups in the census 
often reflect the joint effects of enumeration error and 
content reporting error. For example, estimates of the 
net coverage of a particular age group will include the 
effects both of net coverage of people in the age group 
and of the net transfer of people to and from the age 
group as a result of age misreporting. 

Census error evaluation studies are carried out for a 
number of purposes. Historically, evaluation results 
both estimates of coverage and content errors have been 
used to help improve the methodology for subsequent cen- 
suses and to suggest promising avenues for research and 
testing leading to other methodological changes. They 
have also been disseminated to users to provide general 
information on the quality of the data. In recent years, 
the possibility has been discussed of using evaluation 
results to adjust census statistics in order to improve 
the accuracy of the census counts. To date, the most 
closely related census operations to adjustment both 
occurred in 1970, when two programs the National Vacancy 
Check and the Post-Enumeration Post Office Check were 
conducted on a sample basis and the results were used to 
generate imputations of occupied housing units, occupants, 
and their characteristics in the census. The two programs 
accounted for 0.5 and 0.2 percent, respectively, of the 
total 1970 population count (see Chapter 5) . 

This chapter describes and assesses programs designed 
to evaluate completeness of census coverage of the popu- 
lation excluding for the most part consideration of 



122 

content error evaluation programs (discussed briefly 
Chapter 6) 

Studies directed solely to evaluation of coverage 
housing units and not persons are also excluded. Cha] 
5 reviews specific findings from both population and 
housing coverage evaluation programs regarding gross 
undercount and overcount among groups in the populati 

Finally , the discussion concerns only direct estim 
of net national undercount derived from coverage eval 
tion programs. Methods of making small-area estimate 
(for example, synthetic estimates that apply national 
undercount rates to subnational geographic areas) are 
considered* nor are methods for "strengthening" direc 
estimates (for example , the Fay-Herriot methodology e 
ployed in the 1970s to adjust census income statistic 
used as input for postcensal per capita income estima 
for allocation of general revenue sharing funds) . ch 
7 discusses possible uses of coverage evaluation resu 
for adjustment purposes , including methods for carryi 
adjustments down to smaller geographic areas and meth 
for strengthening the estimates. Chapter 8 presents 
panel's suggestions and recommendations for improved 
methods of coverage evaluation for the 1990 census. 



METHODS OF COVERAGE EVALUATION 

Broadly speaking, there are two major classes of cove 
evaluation techniques i micro-level methods and macro 
level methods. Micro-level or direct methods are bas 
on case-by-case analysis of samples of units such as 
persons or households. Macro-level or analytic methc 
involve analysis of aggregate census data, including 
comparison of census totals with external data (such 
vital statistics or other records) and analysis of 
internal consistency (for example, analysis of sex ra 
by age group and of cohort changes between censuses) . 
variety of micro-level methods have been used in the 
for coverage evaluation. An important distinction am 
micro-level methods is the source of the evaluation d 
administrative records or survey data. The main macr 
level method is demographic analysis. A variety of 
methodological procedures exists for both micro and m 
approaches, and both approaches have been used for cc 
evaluation. 

Micro-level case-by-case coverage evaluation methc 
usually require two samples to estimate net coverage 



123 

error. The first is the W P sample , n or sample of the 
population from a source other than the census itself. 
The P sample provides an estimate of gross under enumera- 
tion. The second is the "E sample/ 1 or enumeration sample 
selected from the census itself- By definition , the E 
sample cannot contain any missed persons but is made up 
of both correct and erroneous enumerations, and therefore 
provides a basis for estimating these components. The 
union of the P and E samples provides estimates of net 
coverage error. 

Fellegi (1984) has classified micro-level coverage 
evaluation methods by treatment of the P sample: 

"Do it again, but better." This involves a 
post-enumeration survey (PES) in which a sample of areas 
is revisited by specially selected and trained enumerators 
who try to do a better job of counting than the census. 

"Do it again, independently." This involves an 
independent survey that is matched to the census. 
Typically, the results are used to develop net coverage 
estimates with so-called capture-recapture or dual-system 
techniques. The 1980 Census Post-Enumeration Program 
(PEP), which matched the April and August Current 

Population Survey (CPS) records to the census, is an 
example. Matches of CPS to census records in 1950, 1960, 
and 1970, which were also carried out for purposes of 
content evaluation, produced as by-products estimates of 
gross omissions only. 

Reverse record checks. In this method, samples 
drawn from four frames (1) persons counted in the 
previous census, (2) postcensal births, (3) postcensal 
immigrants, and (4) persons determined through coverage 
evaluation to have been missed in the previous census-- 
are located to determine whether they are still residing 
in the area, and the resulting estimated number of 
residents is compared with the census total. The 1960 
census in the United States tested a reverse record check 
approach; Canada relies heavily on this method for 
coverage evaluation. 

Administrative records matches. By these methods, 
records or samples of records from one or more adminis- 
trative systems (for example, social security records) 
are matched to the census. The Census Bureau has con- 
ducted coverage studies of specific population groups 
based on administrative records matches, for example, 
using Medicare data to study coverage of persons 65 and 
over. One method, sometimes called the composite list, 



124 

has been detailed in Ericksen and Kadane (1983? thej 
refer to it as the "megalist" method) . 

There are other possible sources for the P sample :ha' 
have been suggested or experimented with in the past 

* Records of household composition generated by 
participant observers in local areas. (Experience wj h < 
single participant observer study in 1970 is describe ir 
Chapter 5. ) 

8 Multiplicity or network surveys, in which censu 
respondents are asked for names of relatives , such as 
parents or children, not living in their household. ' us 
approach was used with limited success in the 1977 
Oakland pretest for the 1980 census (see discussion ii 
Chapter 5) . It was also used to evaluate coverage in :h 
1978 Richmond dress rehearsal, but most of the analysi 
was never completed. 

* Lists generated by localities. The 1980 census 
included a provision for local review of preliminary 
field counts as a coverage improvement method (see Cha; :e 
5) . The Census Bureau also evaluated coverage in New 
York City with reference to local lists furnished by t! j 
city as part of its lawsuit protesting the 1980 census 
count (see Ericksen, 1983| Ericksen and Kadane, 1983). 
However, no attempt has been made to base evaluation c 
adjustment of the census on lists or other ad hoc data 
supplied by localities. 



COVERAGE EVALUATION PRIOR TO 1980: MICRO-LEVEL METHODS 

The completeness of the census count and the quality of 
the data have concerned census officials and data users 
since the first census in 1790. However, formal evalua- 
tion of the census originated in the mid-twentieth cen- 
tury. (The discussion in this section of the history of 
census coverage evaluation programs in the United States 
draws heavily on Bureau of the Census, 1978b, no date-a. 
The social and economic problems of the 1930s and 1940s 
stimulated increased interest in census data for policy 
purposes and correspondingly increased interest in the 
accuracy of the figures. The development of probability 
sampling methods and improvements in vital statistics 
records over the two decades prior to 1950 made it 
possible to develop reasonable measures of accuracy. 



125 

There was no formal coverage evaluation effort in 
conjunction with the 1940 census, although outside 
researchers carried out limited macro-level analysis of 
coverage among certain age groups. 

The Census Bureau experimented successfully with 
micro-level coverage evaluation programs using post- 
enumeration survey techniques for the 1945 Census of 
Agriculture, the 1947 Census of Manufactures, and the 
1948 Census of Business. These efforts led to the 
decision to evaluate coverage in the 1950 Census of 
Population and Housing using a large post-enumeration 
survey. 



The 1950 Census Post-Enumeration Survey 

The post-enumeration survey coverage evaluation method- 
ology used in the 1950 census (see Bureau of the Census, 
1960) was predicated on the notion that errors in the 
census were largely due to failures to implement cor- 
rectly census definitions and procedures and to imper- 
fections in materials and procedures that led to 
respondent misunderstanding and reporting error. Hence, 
the approach used to evaluate both coverage and content 
errors was to "do it again, but better." 

The 1950 PES used a combination area and list sample. 
A sample of the land area of the United States was used 
to identify erroneous omissions of entire households (P 
sample) . A list sample of persons enumerated in the 
census was also used to: (1) check within-household 
errors in population coverage, both omissions and 
erroneous enumerations, (2) identify erroneous inclusions 
of entire households, and (3) measure the quality of 
answers to specific census questions (content evaluation) 
The area sample contained 280 primary sampling units, 
3,500 segments (generally containing 6-10 housing units) 
and about 21,000-25,000 households. To reduce costs in 
canvassing two independent samples, the list sample was 
largely drawn to include most of the households in the 
area sample segments. 

To obtain a high level of accuracy in the PES, 
interviewers were very carefully selected, trained, and 
supervised; more detailed questions were asked than in 
the census; and interviewers were instructed to obtain 
responses from each adult rather than allow proxy 
responses. The per case cost of the PES was about 20 



126 

times the per case cost of the census itself* Int 
viewing took place in August and September 1950. 

The interviewers for the area sample were requi 
make a complete canvass of their assigned segments 
any dwelling units not included in the list sample 
the sample segments as possibly omitted from the c 
and obtain housing information for these units and 
mation for each person living in them as of Census 
April 1. (Hence , the 1950 PES used the household 
position rule later termed PES-A, of determining t 
persons living at the address as of Census Day, as 
opposed to the rule termed PES-B, of determining w 
the persons found by the PES were actually living 
Census Day*) 

Interviewers for the list sample were to visit 
household r determine whether there were other peop 
should have been enumerated at that address as of 
l r determine whether one or more persons or the wh 
household was erroneously enumerated, and obtain r 
to housing and population questions for purposes o 
content evaluation. All the interviewer records f 
both the area and list samples were then matched t 
census files. 

The net undercount estimated by the PES was 2.1 
million persons or almost 1.4 percent, the differe 
between erroneous omissions (2.2 percent) and erro 
inclusions (0.9 percent) . The gross errors includ 
persons who were counted in the wrong place and, h 
showed up as omissions for one place and erroneous 
inclusions for another. At the national level, su 
omissions and inclusions balance out. The PES als 
provided estimates of gross and net coverage error 
the four regions of the country (Northeast, North 
Central, South, and West) by urban/rural residence 
for population subgroups classified by age, race, \ 
and various socioeconomic characteristics such as 
and occupation. 

Evidence from several other sources, including 
quality check conducted as part of the PES, demogr; 
analysis, and independent record checks for select 
population groups, indicated that the PES net cove; 
error was too low probably by as much as 2 percen 
points (see discussion in a later section regardin< 
demographic analysis and independent record checks 
PES quality check, which involved withholding from 
list sample interviewers some names of persons act 
enumerated in the census, found that the interview* 
missed about 12 percent of these people. It appea: 



127 

that interviewers were less effective in identifying 
cases in which the census missed one or more members of 
an enumerated household than in identifying errors 
involving whole households, in part due to the problem of 
persons moving between Census Day and the PES. The PES 
by design did not include transient quarters, such as 
hotels r and hence missed a population group believed to 
have a high net undercount in the census. However, the 
major reason postulated for the understatement of net 
undercount estimated by the PES is what is often termed 
"correlation bias," namely, the tendency for the PES to 
miss, although perhaps to a lesser degree, the same types 
of people who are missed in the census (see discussion in 
a later section of this chapter) . 



Coverage Evaluation in the 1960 Census 

The experience with the 1950 Post-Enumeration Survey led 
the Census Bureau to undertake a more elaborate coverage 
evaluation program for the 1960 census (see Marks and 
Waksberg, 1966). The program included another post- 
enumeration survey and several kinds of record checks, 
including a reverse record check, in addition to 
demographic analysis. 



The 1960 Post-Enumeration Survey 

The 1960 PES again used two samples, an area sample 
and a list sample. The area sample contained 2,500 
segments comprising about 25,000 housing units drawn from 
the 1959 Survey of Components of Change and Residential 
Finance. Enumerators were instructed to list all struc- 
tures and housing units in their segments, to reconcile 
their listings with Survey of Components and Residential 
Finance data and census data, and to identify missed 
housing units and the number of people living in them. 
The list sample was selected independently of the area 
sample and comprised a national sample of about 15,000 
housing units and group quarters drawn from census 
enumerators 1 listing books for about 2,400 enumeration 
districts in 335 primary sampling units covered in the 
Current Population Survey. The sample averaged about two 
clusters of three housing units each per enumeration 
district. The list sample interviewing began in May, 
only one month after Census Day, to endeavor to minimize 



128 

problems stemming from movers and lack of recall c 
part of respondents . The interviewers were given 
list of housing units , but not the names of the oc 
and were instructed to enumerate independently th< 
ascertaining the household composition as of Censi 
as well as the composition in May. The interviewe 
records from the area and list samples were matchc 
census records , with special efforts made to detei 
(a) whether persons in each unit in May who were r 
resident there on Census Day had been enumerated i 
where else and (b) whether the actual residents oi 
unit on Census Day had all been included* 

The 1960 PES estimated the national net undercc 
3.3 million people s or 1.9 percent of the total p< 
tion. The area sample provided estimates of perse 
missed or erroneously enumerated housing units an< 
list sample provided estimates of missed or errons 
enumerated persons in otherwise enumerated househc 
The program produced net undercount estimates for 
race, and sex population groups but not for any o1 
population classifications nor for subnational gee 
areas. Again, evidence from demographic analysis 
other sources indicated that the PES underestimate 
net undercount probably by about 1 percentage po: 
The 1960 PES estimated a higher proportion of miss 
persons in otherwise enumerated households compare 
the 1950 effort (about two-fifths of the total of 
persons in 1960 compared with only one-quarter in 
but was still considered to have been relatively n 
successful in identifying missed households than n 
persons within households. 



1960 Record Check Studies of Specific Populatic 

The 1960 census coverage evaluation program inc 
several record check studies developed in response 
evidence that post-enumeration surveys tend to mis 
same kinds of people that the census misses. Two 
studies were directed toward evaluation of coverac 
specific population groups, namely college student 
elderly persons. Based on samples of students em 
in college in spring 1960 and elderly recipients o 
security benefits in March 1960, the Census Bureau 
mated that the census experienced a gross under cou 
between 2.5 and 2.7 percent of college students an 
to 5.7 percent of the elderly (Marks and Waksberg, 



129 
The 1960 Census Reverse Record Check 

The Census Bureau also carried out a reverse record 
2heck study to estimate net national undercount in 1960 , 
similar to the methodology used in Canada. For the 
reverse record check in 1960 (see Bureau of the Census, 
L964b) , the Census Bureau constructed an independent 
sample of the population as of April 1 from four sampling 
:ramess 

(1) Persons enumerated in the 1950 census? 

(2) Children born after April 1, 1950, and before 
April 1, I960, as registered with state bureaus of 
vital statistics; 

(3) Persons missed by the 1950 census but found by the 
1950 PES? and 

(4) Aliens registered with the Immigration and 

Naturalization Service as resident in the United { 

States in January I960* 

?he sample totaled about 7,200 persons (excluding about i 

100 persons found to be "out of scope" because they had 

lied or moved out of the country or for some other 

:eason) , of a universe believed to consist of about 98 

>ercent of the total U.S. population. Population groups 

lot represented in the four samples included: 

(1) Persons missed by both the 1950 census and the 

1950 PES; ! 

(2) Persons missed in the 1950 census in Alaska and 
Hawaii, which the 1950 PES did not cover? 

(3) Citizens (mostly Puerto Ricans) outside the >], 
continental United States in 1950 but in the 

United States in 1960? , 

(4) Unregistered intercensal births? 

(5) Aliens arriving after 1950 who became citizens 
before 1960? 

(6) Aliens entering the United States between February J 
1 and April 1, 1960? and < 

(7) Aliens resident in January 1960 but not registered 

with the Immigration and Naturalization Service . / 

\j 
wo population groups were represented twice in the four 4 

amples s \ 

(1) Persons missed in 1950 at their usual place of | 

residence and erroneously enumerated at another /^ 



130 

address (represented both as missed persons 
PES and enumerated persons in the census) a: 
(2) Aliens registered in 1960 and enumerated in 
United States in 1950. 

The Census Bureau attempted to trace each sampl< 
person to his or her address as of April 1, 1960| < 
responses to a questionnaire (by mail or in person 
necessary) 9 verifying the address and providing chi 
teristics data to assist in determining enumeratior 
in the census? and match each questionnaire to the 
records. For persons not found in the census or wl 
there was doubt as to the person's enumeration sta 1 
the Census Bureau made further efforts to determine 
whether the person was counted. 

Despite their best efforts, a definite match st< 
(counted in the census or missed) could not be ass: 
to almost 1,200 of the sample cases (16.5 percent c 
total) . Over three-fourths of the failures to mat< 
due to an inability to obtain a current address. I 
the four samples, the sample drawn from the 1950 P] 
the highest proportion of cases for which a def init 
match status could not be assigned over 24 percenl 

Using different assumptions about the rate at wt 
the census counted persons for whom a definite enui 
status could not be obtained, the Census Bureau est 
gross omission rates from the reverse record check 
between 2.6 and 4.7 percent of the total populatior 
(The range for the sample drawn from the 1950 PES v 
to 10.5 percent and for the sample of registered al 
7.3 to 15.4 percent.) Subtracting the PES estimate 
1.3 percent erroneous enumerations in the census ga 
estimates of net undercoverage of between 1.3 and I 
percent. Marks and Waksberg (1966) narrowed the ra 
reasonable net undercount estimates from the revers 
record check to a band of 2.5 to 3.1 percent. Thes 
estimates compare with the net undercount estimate 
percent from the PES and an estimate of 2.7 percent 
demographic analysis. The small sample size of the 
reverse record check and uncertainties stemming fro 
match failures precluded deriving coverage estimate 
population subgroups or subnational geographic area 
the 1960 reverse record check. 



131 
Reverse Record Checks in Canada 

Since 1961 , Canada has relied on reverse record check 
methodology to estimate the completeness of coverage 
achieved in its quinquennial censuses. Fellegi 
(1980as280) notes that demographic analysis, given its 
vulnerability to migration estimates, is not useful in 
Canada because emigration is both significant and not 
well measured. He notes as well (1980as281) the problems 
with the assumption that the probability of being missed 
in a survey is independent of the probability of being 
missed in the census as reasons not to use the Canadian 
nonthly Labour Force Survey as the basis for constructing 
net coverage estimates. This problem is largely obviated 
3y a reverse record check, which does not rely on dual- 
system estimation. Matching problems are also less 
consequential, since an estimate of the total population 
san be prepared after tracing, without matching with the 
census files. (Matching may play a useful role during 
tracing? it is also needed to identify a set of micro 
records of persons missed by the census for use during 
^valuation of the next census.) 

The Canadian reverse record check program (see Gosselin 
and Theroux, 1977, 1978a, 1978b, 1979) combines samples 
Erom four mutually exclusive but together almost compre- 
lensive sampling frames: (1) the previous census, (2) 
:he register of intercensal births, (3) the list of 
Lntercensal immigrants, and (4) persons missed in the 
>revious census as identified in that census's reverse 
record check. Conceptually, a sample drawn from these 
:our frames covers all persons to be enumerated in the 
current census, except illegal immigrants and unregistered 
>irths (the latter are rarities in Canada because of its 
f baby bonus" program). Fellegi (1980a:281-282) summarizes 
:he reverse record check (RRC) procedures as follows: 

The operation consists of meticulously tracing the 
current address of every selected person, and then 
of checking the census records to see whether they 
were included there. The key to the success of 
the project is the tracing operation and we were 
able to trace conclusively in each of the last 
four censuses about 95 per cent of the selected 
persons. ... As part of the tracing operation 
all selected persons who appear to have been 
missed by the census are contacted by an inter- 
viewer, partly to find out whether there may have 



132 

been another address at which they could have be 
enumerated, partly to collect some basic census 
information from them. As a result, the RRC 
project provides not only estimates of national 
provincial under -enumeration rates , but it also 
results in a microdata base [that] . . . has a 
rich analytic potential to describe the "profile 
of those missed by the census. 

The reverse record check of the 1976 Canadian censu 
estimated undercoverage of both persons and occupie 
housing units at about 2 percent (Fellegi, 1980as28 
No E sample has been used to date in Canada (though 
is planned for 1986) because the emphasis has been 
deriving information for lessening the gross underc 
Furthermore, Canada does not make use of the same a 
of coverage improvement programs as the Census Bure 
that probably contribute to over enumerations. 



Coverage Evaluation in the 1970 Census 

Because of the problems with the post-enumeration s 
methodology used to estimate net undercoverage in t 
1950 and 1960 censuses, the Census Bureau made no p 
to carry out a comparable program for the 1970 cens 
placed chief reliance on the method of demographic 
analysis (see discussion in a later section) . Seve 
other programs, including the CPS-Census Match and 
checks for specific population groups, contributed 
knowledge of coverage problems. 



The CPS-Census Match 

The approximately 56,000 households included in 
March 1970 Current Population Survey sample were ma 
to the 1970 census records. Although the match was 
formed primarily for purposes of content evaluation 
(using the subsample of about 10,000 CPS households 
received the census long forms) , it also served to 
evaluate coverage of housing units. Data on missed 
persons were tabulated but never published (see Sie 
1975) . 

The Census Bureau constructed estimates of gross 
undercoverage for the total population and subgroup 
the CPS-Census Match using dual-system estimation 



133 

techniques. The CPS-Census Match estimated a gross under- 
coverage rate of 2.3 percent for all per sons , compared 
with the demographic analysis estimate of 2.2 percent. 
The CPS-Census Match estimate is higher than the demo- 
graphic estimate g at least in part because of the absence 
of an E sample to estimate erroneous overenumerations in 
the census. The CPS-Census Match estimates were adjusted 
for additions to the census count resulting from imputa- 
tions based on the National Vacancy Check, the Post- 
Enumeration Post Office Check, and some "close-out" 
procedures, but they were not adjusted for erroneous 
additions to the census, such as duplicate enumerations. 



Record Check Studies of Specific Population Groups 

The Census Bureau carried out two record check studies 
in 1970 directed toward coverage evaluation. For the 
Medicare Record Check, a sample of approximately 8,000 
persons age 65 and over was selected from Medicare health 
records and matched to the 1970 census records. The over- 
all gross omission rate for this population group as 
estimated from the study was 4.9 percent, a somewhat 
lower rate than that estimated for elderly social security 
recipients in 1960. 

For the D.C. Driver's License Study, the Census Bureau 
matched driver's license records with census records for 
about 1,000 males, ages 20-29, living in a set of selected 
tracts in the District of Columbia, and who obtained or 
renewed their licenses in the District of Columbia between 
July 1969 and June 1970. About 14 percent of the cases 
were identified as missed in the census, with an addi- 
tional 10 percent who were probably missed but for whom a 
definite match status could not be determined. This 
project was designed as a feasibility study. Analysts 
recommended that future studies: (1) narrow the sampling 
time reference, (2) update the address information after 
sampling and before matching, and (3) have the Postal 
Service review the list prior to matching. 



COVERAGE EVALUATION PRIOR TO 1980: MACRO-LEVEL METHODS 

Researchers inside and outside the Census Bureau have 
used aggregate methods to assess the completeness of 
census coverage since the beginning of coverage evalua- 
tion efforts in the United States. The principal 



134 

macro-level method is termed demographic analysis 9 
whereby independent estimates for the population in 
various categories ( typically , age* sex, and race) are 
constructed and compared with the census counts* in 
ideal form, the process of constructing an independent 
estimate for an age-race-sex group, for example , black 
men ages 25-29 in 1970, is as follows s 

(1) Obtain from vital statistics records the count < 
births of black males occurring between April 1, 1941, 
and April 1, 1945 (and apply appropriate corrections foi 
under registration) i 

(2) Subtract the count (obtained from vital statist: 
records) of deaths occurring between April 1, 1941, and 
April 1, 1970, to black males born in the above time 
period? 

(3) Add the count (from Immigration and Naturaliza- 
tion Service statistics) of black male immigrants to th( 
United States born in the above time period who arrived 
between April 1, 1941, and April 1, 1970? 

(4) Subtract the count (estimated as best possible) < 
black male emigrants from the United States born in the 
above time period who left the country between April 1, 
1941, and April 1, 1970. 

The resulting estimate can then be compared with the 19' 
census count for black men ages 25-29 to determine the 
net undercount or overcount of that group in the 
population. 

The above procedure is fairly reliable for populatioi 
groups for which birth registration data are complete 
(essentially those born in 1935 or later), for which 
illegal immigration is negligible, and for which 
emigration is also negligible. Data sources are not 
available that permit accurate estimates either of 
illegal immigration or of emigration. Given the gaps ii 
the data, various methods have been used to construct 
n demographic 18 estimates for particular population group* 
For example, data from Medicare records are currently 
used to construct independent estimates of the populati< 
age 65 and over, rather than using the demographic methc 
outlined above. 

Demographic analysis cannot be performed for populat: 
groups defined according to other characteristics, such 
as income or education, because of the absence of appro- 
priately classified registration information. It has 
also not been possible to use the method for subnationa: 



135 

geographic area coverage estimates , because of lack of 
data on internal migration flows* The method provides 
estimates of net national coverage error for age-sex-race 
groups for which illegal immigration and emigration are 
small, but it does not distinguish among the components 
of error gross omissions, gross overenumerations, and 
content errors such as age misreporting. Nonetheless, 
the method has been extensively developed in the United 
States and is regarded as providing more accurate esti- 
mates than other methods of net undercoverage for the 
1950 f 1960, and 1970 censuses at the national level. The 
sections below briefly review the history of demographic 
analysis of census coverage in the United States prior to 
1980. 



Demographic Analysis Prior to 1950 

After the 1940 census there was some macro-level analysis 
by outside researchers of the completeness of the cover- 
age. With a grant from the Social Science Research 
Council, Daniel 0. Price (1947) compared aggregate census 
data for men ages 21-35 by race with selective service 
registration data and estimated a net undercount of 3 
percent for all men and 13 percent for black men in this 
age group. P.K. Whelpton of the Scripps Foundation for 
Population Research, using vital statistics data, esti- 
mated that white and nonwhite children under age 5 had 
net undercount rates, respectively, of over 6 percent and 
over 15 percent in the 1940 census (Bureau of the Census, 
1944). 



Demographic Analysis in 1950 

Ansley Coale of Princeton University carried out an 
extensive analysis (1955) to develop demographic esti- 
mates of the population in 1950. For age groups under 
15, he used birth registration statistics as the basis 
for his estimates. For older groups, ages 15-64, he 
relied on comparisons with the results of preceding 
censuses (with appropriate allowances for mortality and 
net immigration) . An important assumption underlying his 
method was that within each age-sex group the relative 
net undercoverage was identical in the 1930, 1940, and 
1950 censuses. For persons age 65 and older, Coale used 
the 1950 Post-Enumeration Survey results. Coale estimated 



136 

net undercount for the total population in 1950 at 3.5 
percent , 5.4 million people. 

Coale f s net undercount estimate of 3.5 percent is 2 
times the estimate from the PES of 1.4 percent. The 
Census Bureau developed a "minimum reasonable estimate 
of 2.4 percent net undercount based on PES results for 
persons age 40 and older , birth registration data for 
persons under 15, and examination of sex ratios for 
persons ages 15-39 (Bureau of the Census, 1960:5-6). 
Demographic analysis at the Census Bureau subsequently 
led to refinements in Coale's 1955 estimate. The late; 
estimate (Siegel, 1974) puts the 1950 census net under- 
count at 3.3 percent of the total population. 

The 1950 census evaluation program included a Test c 
Birth Registration, designed to evaluate the completem 
of vital statistics on births. (A similar study was 
carried out as part of the 1940 census,) For 1950, cei 
enumerators filled out cards for infants born between 
January 1 and April 1, 1950. These cards were matched 
birth registration records for the corresponding perioc 
The results, used in Coale's work and other demographi< 
analysis studies, indicated that the registration syste 
recorded 98 percent of all births (compared with 93 
percent in 1940) , including 99 percent of white births 
and 94 percent of all other births (Siegel and Zelnick, 
1966) . 

An extension of this project, the Infant Enumeratioi 
Study (Bureau of the Census, 1953) , matched birth recoi 
for January through March 1950 to the infant cards f ii: 
out by census enumerators to assess completeness of cox 
age for newborns. The study found that about 96 percei 
of infants under 3 months old were enumerated. In aboi 
82 percent of cases in which an infant was missed, the 
parents were also missed. 



Demographic Analysis in 1960 

Census Bureau staff and university scholars carried out 
several studies in the early 1960s to evaluate 
completeness of coverage in the 1960 census. Siegel an 
Zelnick (1966) summarized these studies and presented * 
"preferred analytic composite 89 estimate of 3.1 to 3.2 
percent net undercount of the population in 1960. 

Undercount percentages for persons under age 25 were 
derived from population estimates for this group based 
birth registrations (adjusted for underregistration usi 



137 

the results of the 1940 and 1950 birth registration test) , 
registered deaths 8 and estimated net external migration. 
The estimation method used for whites age 25 and older 
was quite complex. The method represented an extension 
to 1960 of coverage estimates for 1950 published in Coale 
and Zelnik (1963) for the native white and total white 
population by age and sex. 

Coale and Zelnik constructed estimates of annual 
births and birth rates for the native white population 
from 1855 to 1934 using single-year age distributions 
available in every census since 1880. For each cohort, 
they estimated the proportion that could be expected to 
survive each intercensal decade based on mortality data, 
then used those figures (adjusted for immigration) to 
estimate the number of births that should have occurred a 
certain number of years before a given census to account 
for the number of persons enumerated at a certain age in 
that census. This description is an oversimplification, 
as various complex adjustments were required to attempt 
to compensate for deficiencies in the census single-year 
age distributions and the mortality records. From the 
estimated annual birth data for native whites, Coale and 
Zelnick constructed estimates of total population, by age 
and sex, then of coverage errors for the native and total 
white population of each age and sex group in each census 
from 1880 to 1950. 

The 1960 coverage estimates for nonwhite women 25 years 
and older represented extensions of the estimates devel- 
oped by Coale (1955) for this group in 1950, using an 
iterative technique assuming that age patterns of under- 
count were similar in the 1930, 1940, and 1950 censuses. 
The 1960 coverage estimates for nonwhite men 25 and older 
were the result of applying expected sex ratios to the 
estimated nonwhite female population (whose coverage 
errors are lower) and comparing the results with the 
census counts. 

Siegel (1970) updated the 1960 coverage estimates to 
incorporate population estimates for the elderly based on 
Medicare data. Siegel (1974) published the current 
"preferred" estimate that the 1960 census undercounted 
the population by 2.7 percent or 5 million people. The 
preferred demographic analysis estimate is 1.4 times the 
estimate from the 1960 PES and falls within the range of 
2.5 to 3.1 percent estimated in the 1960 reverse record 
check. 



138 
Demographic Analysis in 1970 

The Census Bureau relied on demographic analysis as 
principal method of evaluating coverage in the 1970 
census. The data used for the demographic estimate 
included birth and death statistics , life tables, i 
tion data, Medicare enrollments, and data from prev 
censuses. Siegel (1974) published a range of estim 
with the "preferred" estimate that the 1970 census 
undercounted the population by 2.5 percent. The ra 
estimates stemmed from differing assumptions regard 
net undercount in the 1960 census and population ch 
in the following decade. 

The preferred estimates were developed as follow 
Population estimates for persons under 35 were base 
adjusted birth statistics, projected forward to 197 
accounting for deaths and estimated net migration, 
birth data were adjusted for underregistration usin 
results of the 1940 and 1950 birth registration tes 
of another study of completeness of birth registrat 
for 1964-1968 (Bureau of the Census, 1973e) . For t 
latter study, a sample of about 15,000 children bor 
between 1964 and 1968 who were included in the Curr 
Population Survey or the Health Interview Survey fr 
June 1969 to March 1970 were matched to birth recor 
The study found that 99.2 percent of births during 
period were registered, including 99.4 percent of w 
births and 98 percent of all other births. 

Population estimates for white women ages 35-64 
1970 represented extensions to 1970 of the 1950 est 
for white women ages 15-44 developed by Coale and Z 
(1963) based on estimating annual births. Populati 
estimates for black women ages 35-64 represented ex 
sions to 1970 of 1960 estimates for these cohorts 
developed by Coale and Norfleet W. Rives, Jr. (1973 
The Coale and Rives study constructed estimates of 
black population and of black birth rates for the p 
1880 to 1970, starting with the assumption that the 
"true" population in 1880 could be represented by " 
tables of stable population age distributions. 

The population estimates for white and black men 
35-64 were derived by applying expected sex ratios 
per 100 females) to the corresponding female popula 
estimates. Finally, estimates of persons 65 and ov 
were derived from Medicare data, adjusted for per so 
enrolled and further adjusted for consistency with 
expected sex ratios for the elderly population. 



139 

Subsequent work with new data permitted refinements in 
the 1970 coverage estimates. The latest estimate is that 
the 1970 census had a net undercount rate of 2.2 percent 
(Passel et al., 1982). The revision was primarily attri- 
butable to increased allowances for emigration during 
1960 to 1970, for Medicare under registration at ages 
65-69 in 1970, and to small changes in estimated com- 
pleteness of birth registration for 1935-1970. 

As described more fully in Chapter 5, demographic 
analysis estimates of net undercount in every census 
since 1950 indicate better coverage for women, on aver- 
age, than for men, and for whites than for persons of 
other races. During the 1970s, Census Bureau researchers 
endeavored to develop coverage estimates for states 
(Siegel et al., 1977) and for the growing Hispanic 
population (Siegel and Passel, 1979) , but these efforts 
were frustrated by lack of reliable data. The growing 
interest in coverage estimates for subnational areas and 
for other population groups besides blacks and whites, 
coupled with severe data problems, such as absence of 
data for estimating illegal immigration and emigration 
and for reliably estimating internal migration, led the 
Census Bureau to decide that demographic analysis could 
not be the principal coverage evaluation method for 
1980. The Census Bureau planned, in addition to demo- 
graphic analysis, to carry out a program to match an 
independent survey to the census (the P sample) and 
recheck a sample of census records (the E sample) . The 
results would be used, through dual-system estimation, to 
construct coverage estimates for the nation, states, and 
large metropolitan areas. The 1980 Post-Enumeration 
Program and the demographic, analysis efforts carried out 
for 1980 are discussed in detail below. 



THE 1980 POST-ENUMERATION PROGRAM 

In 1980, the Census Bureau implemented a coverage 
evaluation program closely related to the post-enumeration 
surveys used in conjunction with the 1950 and 1960 
decennial censuses. The aim of this program, called the 
1980 Post-Enumeration Program (PEP) , was to provide 
estimates of net undercover age in the 1980 census, with 
considerable geographic detail, possibly down to the 
level of states and large cities. 

The basic methodology of the 1980 Post-Enumeration 
Program was "do it again, independently." Thus, the 



140 

sample recount was not intended to be more complete th 
the census , only independent of the census. If the 
independence assumption applies , then the estimate of 
number missed can be arrived at via a model similar to 
the one used in the estimation of wildlife populations 
called capture-recapture. When used in the context of 
the census, the model is referred to as dual-system 
estimation (see Marks et al., 1977, and Sekar and Demi 
1949, for an early use of the capture-recapture method 
ology in a census context) . 

In the case of wildlife populations, a sample of th 
population is taken and identified or tagged. It is 
assumed that every member of the population has an equ 
chance of being tagged . Then another independent samp 
is taken. Again at this second stage, it is assumed t 
every member of the population has an equal chance of 
being tagged, although not necessarily the same chance 
at the first stage. The population thus falls into fo 
mutually distinct groups; (1) those caught the first 
the second time, (2) those caught the first time but n 
the second time, (3) those missed the first time and 
caught the second time, and (4) those missed both time 
The total population is the sum of these four groups. 
The difficulty is that the fourth group's number is 
unknown. 

At this point the assumption of independence is use 
The population that was caught the first time is estim 
to have the same probability of being captured the sec 
time as the total population. Thus, the percentage of 
the population caught the first time that is also capt 
the second time is assumed to be the same as the per- 
centage of the entire population that is captured the 
second time. 

It has long been thought that either the independen 
assumption or the assumption of equal capture probabil 
ities or both are very likely seriously in error. The 
failure of the assumption of independence is sometimes 
referred to as "correlation bias." It is commonly 
believed (and partially supported by Valentine and 
Valentine, 1971) that certain persons, such as undocu- 
mented aliens, others wishing to avoid detection by 
authorities for a variety of reasons, and those either 
with multiple residences or living in quarters that ar 
not clearly residential, are missed by both the census 
and sample surveys more frequently than the joint 
assumptions of independence of capturing mechanisms an 
equal capture probabilities would yield. 



141 

In the application of dual-system estimation to the 
1980 census, the two assumptions of equal capture 
probability at each stage and the independence of capture 
probabilities 9 which are generally believed not to hold, 
were modified as follows. The Census Bureau stratified 
the population into subpopulations and used dual-system 
estimation separately in each stratum. Thus the two 
assumptions used were that the individuals within these 
strata had equal probabilities of capture and that the 
capturing mechanisms operated independently within 
strata. The strata were defined using the variables age , 
race, sex, ethnicity, and area of residence. 

Notationally, for each stratum, if n^ are caught the 
first time, 1^2 are caught the second time, m are caught 
both times, and we are interested in estimating N, the 
total population, it follows from the two assumptions 
given above that: 

m / n^ = r\2 / N and hence 8 = (n^ x n2> / m. 

Parallel to the wildlife example, in the 1980 census 
dual-system estimation model, the census served as the 
first method of capture and the Census Bureau used the 
Current Population Survey sample of households as the 
post-enumeration survey, the second capture mechanism. 

Unlike the wildlife situation, it is not always easy 
to ascertain whether an individual was included in the 
census. The individuals themselves cannot reliably 
respond to this question, partly because a member of a 
household may not know whether another member completed a 
questionnaire that included them. To determine which 
people counted in the post-enumeration survey were also 
counted in the census, a match of individuals in the 
post-enumeration survey and the census was carried out, 
matching by information on name, address, sex, age, and 
race. Ideally, one would need to match both the PES to 
the census and vice versa to be able to identify 
inaccuracies and errors in both the lists. While it is 
conceptually straightforward to search the census records 
for Current Population Survey records, the procedure used 
in 1980 did not facilitate matching a substantial 
percentage of census records to the Current Population 
Survey. The purpose of such reverse matching could be to 
determine records in the census that were either erroneous 
enumerations or duplications. Therefore, as mentioned 
above for previous post-enumeration surveys, a sample of 
census records was taken, called the E sample. 



142 

Below we give details concerning both the P sample 
the E sample. 

Finally, there are two populations that were nc 
sampled by the Current Population Surveys individ 
military barracks and the institutional population 
these populations, a supplemental sample was drawn 
do not describe the treatment of these samples of 
quarters further here. 



The P Sample 

The P Sample comprised the April and August 1980 s 
of the Current Population Survey. Samples for two 
were used so that the overall sample size would be 
enough to provide fairly reliable estimates of und 
age for states and major cities. (Due to the curr 
sign of the Current Population Survey, samples tak 
months apart have no overlap.) Each CPS month inc 
about 70,000 households and about 185,000 individu 

To obtain the P sample information, the Current 
Population Survey interview was supplemented with 
pieces of informations (1) a sketch map of the ma 
roads so that residences could be located unambigu 
on census maps and (2) for the August CPS inter vie 
list of all places of residence of each person bet 
January 1 and the time of the interview, as well a 
information that would help to validate the geogra 
locations. The address for each interview was the 
geocoded to the census enumeration district. The 
questionnaires for that enumeration district and 
that enumeration district were then clerically se 
for a person with closely matching information on 
address, sex, age, and race. 

If no matching census record could be found, or 
there was not enough information to ascertain mate 
status, a follow-up interview was attempted. An i 
asymmetry should be noted here. The Current Popul 
Survey enumerations were followed-up for verificat 
but the census enumerations were considered valid 
follow-up. The E sample was intended to account f 
asymmetry. 



The E Sample 

The E sample, as mentioned above, was designed to 
the accuracy of the information provided in the ce 



143 

specif ically, to count possible overenumeration from each 
of the following sources s (1) records placed into the 
wrong census enumeration district, (2) records resulting 
from erroneous enumerations, e.g., individuals born after 
Census Day, and cases fabricated by census enumerators 
(called curbstoned cases), and (3) multiple records for 
the same individual. For measurement of net error, these 
cases would be subtracted from the gross undercoverage 
estimates. 

In the administration of the E sample, 100,000 census 
questionnaires were selected and follow-up enumerators 
were sent into the field to detect erroneous enumerations 
and incorrect geocoding. A 50 percent subsample of the 
questionnaires in the E sample was also checked for 
duplicates. 



The Combined Estimate 

The dual-system estimation was then modified to reflect 
geocoding errors, erroneous enumerations, and duplica- 
tions these add wrongly to the number of nonmatches in 
the census and therefore should be subtracted from the 
census count. (Also subtracted from the census count 
were the number of field imputations 1 that were also 
not matchable by CPS records.) If d represents the 
estimated number of duplications in the census, g the 
estimated number of persons placed into the wrong 
enumeration district, ee the estimated number of 
erroneous enumerations, and i the number of field 
imputations, the estimate of the total population then 
becomes: 

N = ({n^-d-g-ee-i} x n 2 ) / m. 



There were inconsistencies between the treatment of 
census and the CPS cases, which also further complicated 
matters. For example, the degree of verification of 
inclusion in the two lists differed, with the census 
operating under somewhat more flexible rules concerning 
what individuals might serve as a substitute respondent 
for a particular individual. This difference might have 



1 Field imputations are those that did not result from 
damaged questionnaires. 



144 



in .,** -e models 

collected data and the da^a " * rOgrani ' tn * quality of 
reliability of the r,,i** process ing affect the 

y 



y andor 

always enough inforLnt^^ 9 U that the 
two records match. Anotner implicit * ^^ Wheth 
g^ven sufficient informatfon to a asSum P tio " is tl 
two records, there are error's In*?? match status * 
algorithm. Slight rela^V the ma tching 

tions, e.g., thaf t^yToldTo/a verv^' *- 
the cases, would probably still * y arge ma J^ty 

reliable estimates. ?n thf reL^?^ "* calc ^at ioi 
discuss the incompleteness in Thf r f this section 
1980 PEP and how the Census Lr^ *"* * Sam P les ** < 
S BUreaU ^ 



ho of 

(2) failure to atteS or^ n ninterviews ln >e CPS; 
for initially uiSS^ cpTc 6 f U W - UP interv ^ w 
obtain acceptable fono^^intJr 68 '" *"* (3) failure t 
unmatchable CPS cases iJi ; nterviews for initially 
April 1 addresses! ^r theT^ ^^^ *^** 
source of missing data- faifnr T there is one P ri ^ 
follow-up interviews! failure to obtain acceptable 

In April l Qftn *.u 

S-vey (P Sampl 8 e ; ^J^V* Current Population 
Porarily absent, or did nnT at Were re fusals, tem- 
about 4.4 p erc e n 't. ^ app.oaT^ for ther a sons w< 
these cases as nonresidents V" ^^ Was to treat 
search CPS records fortervL T^ a PP roa ^ was t< 
ceeding months for these casIsJh Precedin 9 and sue- 
to match them to the *ens '^ permitt ing attempts 
survey and respondents are Lked tof " ' r tating 
for a total of 8 months in -, Urnish lnf "tion 
majority of these households co^"^ Perlod - ) The 
census. The successful sei^ 6 mat hed to the 
boring months was referred to a t nterviews i neigh- 
Tvpe A noninter views were used f n 3 ^ A non interview. - 
census undercoverage and nofuseS T^T^ f 



145 

Besides the decision of whether to include Type A 
noninterviews, two other decisions arose over the 
inclusion or exclusion of certain data. First, when the 
E sample search was unable to determine where a person 
included in the census resided on Census Day 9 postal 
carriers were consulted. These data were not considered 
by the Census Bureau to be of high quality (see Cowan , 
1983) . Second f the information on date and place of 
residence from August CPS movers (i.e., CPS interviewees 
in August who had lived elsewhere on Census Day) was 
determined also not to be of high quality, 

Most CPS interviews that did not match to the census 
were followed-up in the field in order tos (1) determine 
or confirm April 1 address and (2) improve the quality of 
information on the precise geographic location of the 
Census Day address. Of the cases in which the follow-up 
interview was complete, the majority (60 percent) had 
been correctly geocoded and the correct enumeration 
district f s questionnaires had been searched for a match. 
Therefore these cases were given the status of not matched 
to the census. However, some follow-up interviews were 
not attempted or completed. Of those that were com- 
pleted, some were considered to be unacceptable because 
the interviewers could not follow the fairly strict pro- 
tocol on required features such as self-response. 
Finally, for a large number of cases, the follow-up 
interview was completed but the resulting address infor- 
mation was incomplete or in some other way not precise 
enough to determine the proper enumeration district for 
the residence. Occasionally the respondent reported W I 
don f t know" or refused to answer. 

A rough summary of the incompleteness in the April P 
sample is provided in Table 4.1. The situation for the 
August P sample i's worse, primarily due to the problems 
*ith movers, as noted above. As a rough approximation, 
:>y adding the 4.4 percent rate of refusals for households 
:o the 4.0 percent rate of unresolved matches (this is an 
ipproximation because percentages of households and indi- 
iduals cannot strictly be added together), we arrive at 
.he fact that, for over 8 percent of the people in the 
EP, a match status could not be determined from the data 
ollected. This compares with the percentage net under- 
Dunt, which is probably less than 2 percent on a 
ational level. 

In order to take account of these missing data, the 
snsus Bureau used various forms of imputation, as well 
s other approaches described below, to arrive at 12 



146 

TABLE 4.1 Response and Match Resolution Rates for April 1980 C 
Population Survey (P Sample) by Race 

Percentage of April P Sample 

Households by Race and Re- Percentage of April P Sample Individuals in Resf 

sponse Status for Series 3 Esti- Households by Race and Ethnicity and Match Ste 

mates* Series 3 Estimates 







Did Not 


Race and 








Race 


Responded 


Respond 


Ethnicity 


Matched 


Nonmatched 


Unr< 


Total 


95.6 


4.4 


Total 


92.6 


3.4 


4.0 


White 


95.7 


4.3 


Black 


85.8 


7.4 


6.8 


Other 


94.5 


5.5 


Nonblack Hispanic 


87.7 


5.5 


6.8 








Other 


93.8 


2.7 


3.5 



a Series 3 estimates, described more fully in the text, did not use Type A noninterview infoi 
but reweighted these cases to behave identically to the interviewed cases. 

SOURCE: Wolter (1983:Exhibit B). 



different sets of estimates of undercover age for stab 
major cities, and remainders of states. The formatio 
these 12 estimates resulted from various choices cone* 
ing which CPS month to use, the treatment of informat 
considered of questionable quality, and the treatment 
unresolved matches. The Census Bureau used a combina 
of weighting and imputation, the weighting represents 
essentially imputation of the average matching rate f 
individuals of the same demographic characteristics. 
Thus, weighting did not make use of as much informati 
as the imputation, e.g., it did not use information al 
the cause of the incompleteness. 

Some of the choices resulting in the 12 estimates 
were: (1) choosing whether to use Type A nonintervie 
for matching decisions or using weighting to assign 
matched status, (2) choosing whether to use Post Offi 
information for the E sample or to treat these cases < 
noninterviews and use weighting to assign match statu 
(3) choosing whether to use information from any move] 
in August or to treat them as noninterviews and use 
weighting to assign match status, and (4) in general, 
both the E sample and the P sample, choosing to use 
weighting for all incomplete cases or to use imputati< 
As a result of these choices, plus the choice of whic 
CPS month (April or August) to use as well as other 
choices not mentioned here, the Census Bureau develop 
27 estimates of undercoverage for states and major 
cities. Later, this number was reduced to 12 estimat 



147 



TABLE 4. 


2 Scheme to Identify Various 1980 PEP Estimates 


Code 


Month 


P Sample Treatment / E Sample Treatment 


2-9 


Apr 


P 


With Type A noninterviews 






E 


Post Office results considered noninterviews 


3-9 


Apr 


P 


Without Type A noninterviews 






E 


Post Office results considered noninterviews 


2-8 


Apr 


P 


With Type A noninterviews 






E 


With Post Office results 


3-8 


Apr 


P 


Without Type A noninterviews 






E 


With Post Office results 


5-8 


Aug 


P 


Movers used 






E 


With Post Office results 


5-9 


Aug 


P 


Movers used 






E 


Post Office results considered noninterviews 


10-8 


Aug 


P 


Movers treated as noninterviews 






E 


With Post Office results 


3-20 


Apr 


P 


Without Type A noninterviews 






E 


Incomplete cases treated as simple noninterviews 


2-20 


Apr 


P 


With Type A noninterviews 






E 


Incomplete cases treated as simple noninterviews 


14-20 


Apr 


P 


Incomplete cases treated as simple noninterviews 






E 


Incomplete cases treated as simple noninterviews 


14-8 


Apr 


P 


Incomplete cases treated as simple noninterviews 






E 


With Post Office results 


14-9 


Apr 


P 


Incomplete cases treated as simple noninterviews 






E 


Post Office results considered noninterviews 



NOTE: Every estimate in this table made use of clean-up information, essentially more extensive 
efforts to collect follow-up interviews, etc. 

SOURCES: Cowan and Bettin (1982:14); Cowan (1983:32-33). 



that the Census Bureau felt all represented reasonable 
alternatives. Table 4.2 provides the definitions of 
these 12 sets of estimates. They are denoted by a 
two- integer hyphenated code, in which the first number 
describes a treatment of the P sample, and the second 
number indicates a treatment of the E sample. 

There are a number of reasons, both a priori and a 
posteriori, supporting the various individual estimates 
from this list of 12 estimates. For example, estimate 
10-8 reduces the problem for movers when using the August 
P sample. Also the handling of incomplete interviews in 
the 14 and 20 series of estimates is similar to that used 
in the Canadian census. These points among others are 
detailed in Bailar (1983c) . 

The use of these 12 estimates produced very different 
estimates of undercover age for national demographic 
groups, as shown in Table 4.3. Some analysts have sug- 
gested that the number of acceptable estimates should be 



148 

TABLE 4.3 1980 PEP Estimates of Percentage Undercoverage f< 
graphic Groups at the National Level 



Estimate 
Code 


National 


Black 


Nonblack 
Hispanic 


2-9 


1.4 


6.7 


5.6 


3-9 


1.3 


6.3 


5.3 


2-8 


1.0 


5.6 


4.4 


3-8 


0.8 


5.2 


4.1 


5-8 


1.6 


4.3 


6.4 


5-9 


2.0 


5.4 


7.6 


10-8 


0.2 


2.7 


3.6 


3-20 


1.6 


6.9 


5.5 


2-20 


1.7 


7.2 


5.8 


14-20 


-0.3 


2.5 


1.2 


14-8 


-1.0 


0.7 


-0.2 


14-9 


-1.1 


2.0 


1.0 



SOURCE: Cowan and Bettin (1982:Tables HI- 1,12). 



narrowed considerably. For example, Ericksen (198^ 
would discard all but the 2-8, 2-9, 3-8, and 3-9 es 
as either based on August data, which had a higher 
of cases with unresolved match status, or as makinc 
of extreme assumptions in the adjustments for missi 
data. However, even within this restricted set, tt 
national undercount rate ranges from 0.8 to 1.4 per 



1980 DEMOGRAPHIC ANALYSIS 

In the years prior to 1980, demographic analysis he 
provided what were considered to be the most trusts 
estimates of undercoverage for certain demographic 
at the national level. However, demographic analys 
1980 is generally considered to be significantly 16 
accurate than for any of the previous three censuse 
though the reliability of some components of the es 
probably improved) . In this section, we briefly de 
the data sets and models used to calculate the 198C 
graphic analysis estimates of undercoverage. (The 
source for this section is Passel, 1983.) 

The demographic method developed a preliminary 
estimate of the April 1980 national population of 2 
million based on the so-called preferred estimate c 
undercount for 1970 (see Bureau of the Census, 1974 
The 1980 decennial census counted 226.5 million. I 
generally assumed that some undocumented aliens we* 



149 

counted in the census but that a sizable percentage were 
not. Since the demographic estimates incorporated 
undocumented aliens only indirectly by assuming that net 
illegal immigration was equal to another unknown, namely 
emigration of legal residents , it was generally assumed 
that the census had experienced an undercount nationally, 
but it was difficult to estimate how much. 

As more and improved data concerning the components of 
population change, births, deaths, and legal migration 
became available, better estimates were made. These esti- 
mates (Passel et al., 1982) were an improvement over the 
April estimates; however, they could not make use of data 
on recent fertility, mortality, or immigration nor of 
1980 Medicare data, which were not yet available. 

Work has continued on improving coverage estimates 
based on demographic analysis. To understand the nature 
of these improvements, we discuss in turn each of the 
major data components of demographic analysis separately 
for persons under and over age 65. 

The Population Under Age 65 
Birth Records 

For 1980, population estimates based on virtually 
complete birth registration can be obtained only for the 
population under age 45. However, even for the years of 
virtually complete birth registration (1935 to the 
present) , correction factors are used that are based on 
tests of the completeness of birth registration records 
for the white population. In addition, the demographic 
estimates had to use preliminary data on the number of 
births in 1979 and 1980, since final information on 1979 
and 1980 birth registration did not become available 
until late 1983. Birth registration data are incomplete 
Eor persons between ages 45 and 64. Coale and Zelnik 
(1963) and Coale and Rives (1973), using stable population 
and other analysis methods, constructed estimates of the 
number of births for the years before 1935. 



Death Records 



Death statistics are used with very little correction 
Eor underregistration. Two minor exceptions are: (1) a 
small adjustment for the underregistration of infant 



150 



deaths between 1935 and 1960 and (2) the use of Medica 
records for deaths of people over age 70 between the 
years 1970 and 1980. Some smoothing of the deatti rate 
is also used. 



Legal External Immigration 

Records on immigration from 1935 to 1980 are >rovid< 
by the Immigration and Naturalization Service (XISJS) * 
Final data for most age, sex, and racial groups for the 
years 1979 and 1980 had not been provided to the Census 
Bureau by INS by late 1983. The overall effect on the 
preliminary estimates is thought to be small. Emlgrati 
is only indirectly measured. Differences in estimates 
from consecutive censuses of the number of foreigm-born 
persons has provided estimates of the net change in the 
foreign-born, which, when combined with immigration dat 
can be used to provide estimates of emigration. This 
technique is due to Warren and Peck (1980) . 



The Population Age 65 and Over 

For the population age 65 and over, Medicare data are 
used to provide estimates of coverage? however, Medican 
does suffer from a small amount of underregistration 
(Bureau of the Census, 1974a) . The population figures 
used in the demographic estimates contained adjustments 
for the underregistration. 



Undocumented Immigrants 

The most serious deficiency in the population balance 
equation used for demographic analysis is the laclc of 
information on the net flow of illegal or undocumented 
immigrants into the United States. A comprehensive 
discussion of this problem is contained in a recent 
National Research Council report on immigration statist! 
(Levine et al., 1985). The estimates of the number- of 
undocumented aliens residing in the United States at the 
time of the 1980 census ranged from 2 to 12 million. No 
records of entries or departures are available for: 
undocumented aliens (although some losses to the 
undocumented population through death may be inclucled in 
numbers of registered deaths), so this population is 



151 

Impossible to incorporate into the standard demographic 
analysis. Various attempts have been made to estimate 
the size of the undocumented population, or particular 
components of it, using for example the I960, 1970, and 
L980 censuses of Mexico or Immigration and Naturalization 
Service data (see for example Goldberg, 1974 j Bean et 
al., 1983? see also Appendix 8.1). Unfortunately, the 
results of these analyses are not precise enough to do 
nore than set broad limits, between 2 and 4 million, on 
the size of the undocumented population resident in the 
Jnited States in 1980 (Levine et al., 1985). 

Warren and Passel (1983) applied a modified form of 
iemographic analysis to the foreign-born population 
enumerated by the 1980 census to estimate the number of 
mdocumented aliens included in the census, a necessary 
preliminary step to estimating undercover age of the 
Legally resident population. Upon removing the undocu- 
nented aliens included in the overall count, this method 
estimated that the census had a 0.5 percent national 
indercount of the legally resident population, with a 5.3 
>ercent undercount for blacks and a 0.2 percent overcount 
for whites and other races (see Chapter 5) . These numbers 
igree fairly well with PEP estimates 3-8 and 2-8, given 
In Table 4.3. 

Many other factors besides the quality of the data 
;ets are involved in the reliability of estimates based 
>n demographic analysis. For example, there is a large 
iinount of uncertainty in the racial and ethnic categori- 
sation used in demographic analysis, both between censuses 
md between any census and other sources, such as vital 
;tatistics records. Furthermore, the models of the com- 
>leteness of birth registration are themselves based on 
lata that made use of matching studies, which are gen- 
irally error-prone. Finally, subnational estimation 
>rocedures using demographic analysis are still undergoing 
esearch. The possibility of developing useful estimates 
.n the near future appears to be small, due to the lack 
f estimates of interstate migration (see Siegel et al. , 
.977). 



.ECENT USE OF ADMINISTRATIVE LISTS FOR COVERAGE EVALUATION 

he existence of administrative records, for example the 
nternal Revenue Service personal income tax files, 
edicare records, and social security records, raises the 
ossibility of basing a coverage evaluation program on 



152 

these administrative lists. Roughly speak ing , one ou 
match samples of these lists (various possibilities ha 
been suggested for how this might be done) and then ma 
use of dual-system estimation or some generalizatio t 
estimate the undercover age of the census list (see 
Appendixes 4.1 and 4.2 for discussions of multiple is 
methods and the matching of administrative records) 
major advantage of such an approach is that househo 3- 
based coverage evaluation programs , such as the Pos - 
Enumeration Program, are better designed to account :c 
missed households rather than missed individuals in 
enumerated households. Quite possibly , a large numl ;r 
these missed individuals are included on administra L\ 
lists. 

The use of administrative records and multiple 1 st 
methods as a major component of a decennial census 
coverage evaluation program on a national level has ic 
been attempted in the United States. However , the C sn 
Bureau has performed several tests -on a national ba* . 
The Census Bureau has also used administrative recoi s 
studies of gross omissions for limited populations. 
Examples of the former include the IRS/Census Direct 
Match Study (see Childers and Hogan, 1984a, and Chai :c 
below) and a three-way CPS/Census/IRS match study (s e 
Hogan, 1984a) . Examples of the latter include a sti lj 
that matched Medicare records with census questionna r 
(see Bureau of the Census, 1973d) and two studies tfc .1 
matched, respectively, social security beneficiaries e 
college students with census files (see Marks and 
Waksberg, 1966, and previous discussion above). Non 
the studies mentioned except the as yet unreported 
CPS/Census/IRS match study explored the difficultie 
using more than one list besides the census list. C i 
aider ing the differential undercover age present in a \ 
one currently proposed list, such testing is highly 
desirable. 

Assuming that at least two lists are to be used ( ; 
with the census list) , there are two primary methods 
proposed that make use of multiple lists to estimate I 
rate of census omission. The first method, composit 
list formation, merges all but the census list into , 
super list or composite list. (Sampling is almost c< 
tainly used in the merging process due to the expens< 
matching two large files.) The composite list is th< 
matched with the census file. The estimation of the 
of omission, arrived at by estimating the number of 
people not represented by either the composite list < 



153 



ensus, follows from the use of dual-system estima- 

described above and in Appendix 4,1. The second 
ique, which we call the multilist method, proceeds 
mpletely matching samples from every administrative 
with each other and with the entire census list, 
results in a multidimensional contingency table with 
ount in every cell determined by an individual's 
is ion or omission by the various lists. The cell 
senting the number of individuals missed by every 
must be estimated in order to estimate the omissions 
in the census. (These two methods , composite list 
tultilist, as well as a third, less often proposed 
d using covariate information for modeling within 
ontingency table , are discussed more fully in 
dix 4.1.) 

mposite list and multilist methods each make certain 
ptions. Failure of these assumptions would cast 
us doubt on the reasonableness of the resulting 
ates of omission rates. When using the composite 
method or when completely matching all lists, it is 
sary that: 



) The lists are available for the entire United 

States? 
) There exists an identifier, such as social 

security number, or a suitable number of common 

responses that permit matching; 
) There is very low item nonresponse and misresponse 

for variables used in matching; 
) The addresses on the various lists are the address 

of residence; and 
) There are few false matches and few false 

nonmatches, and the treatment of unresolved 

matches through imputation of matching status is 

effective. 3 

using the composite list method, it is also 
sary that: 



oint out that it would be desirable to provide 
itative bounds instead of the qualitative expres- 
used. However, it is currently not possible to do 
ue to the lack of research. 



154 

(1) The merged list have little differential 
under count and 

(2) Either the first-order independence assumption 
used in dual-system estimation nearly hold or the 
degree of dependence be well estimated. 

In using the multilist method, it is also necessary 
that: 

(1) The various lists be weighted so that no 
identifiable subpopulation is differentially 
under represented on any list and 

(2) Either the higher-order independence assumption 
used nearly hold or the degree of dependence be 
well estimated.* 

Investigation of the above requirements for a success 
ful national coverage evaluation program based on the us 
of existing administrative lists would cause one to be 
less than optimistic for this application of administra- 
tive lists. However , current trends in our society 
toward increasing computerization, automation, editing, 
quality control, etc., are likely to increase the pos- 
sibility of meeting many of the requirements. In 
addition, progress at the Census Bureau in areas such as 
automated list matching should benefit the use of 
administrative records for coverage evaluation. There- 
fore, it is likely that many of the above requirements 
may be within reach in the foreseeable future. 



The New York City Match 

Another test of composite list methods occurred in the 
lawsuit in which the City and State of New York sued the 
U.S. Department of Commerce for federal funds that they 
claim they were deprived of as a result of differential 
under cover age. Briefly, the plaintiffs created a com- 
posite list, referred to as the "Megalist" (details 
provided below). At the request of the plaintiffs, the 



* Especially when using a large number of lists, it is 
helpful (but not necessary) that the lists contain few 
people who are not supposed to be included in the census 
such as people who died before Census Day. 



155 

court instructed the Census Bureau to determine the 
number of people on this list who were residents of New 
York City on Census Day, 1980 , and who were not counted 
in the census. Once the number of people not included in 
the census was determined, the plaintiffs arrived at an 
independent estimate of the number of residents of New 
York City as of Census Day, 1980. Since the test deals 
only with New York City, it does not address the diffi- 
culties faced with a national application of the methods. 
However, it is currently the most-developed application 
of the use of composite list methods for coverage evalua- 
tion. (The following discussion is taken from Ericksen 
and Kadane, 1983.) 

In this application of composite list methods, the 
court directed that the Census Bureau use the following 
10 lists: 

(1) Consolidated Edison electricity billpayers; 

(2) Babies born immediately preceding Census Day? 

(3) People who died immediately after Census Day; 

(4) New York City public school children; 

(5) Persons arraigned in city courts; 

(6) Students enrolled at the City University of New 
York; 

(7) Persons in "Medicaid Eligibility File"; 

(8) Licensed drivers; 

(9) Registered voters; and 

(10) Recipients of unemployment benefits. 

Since there are a substantial number of people included 
more than once on the above lists, and since it is not 
possible to match more than a fraction of the population 
of New York City's residents, a sampling plan was neces- 
sary in forming the composite list in order to select the 
cases to be matched to the census, as well as a procedure 
that prevented duplicates from being represented in the 
composite list. 

The formation of the composite list proceeded as fol- 
lows. (See Kadane and Lehoczky, 1976, for a full discus- 
sion of the underlying methodology.) First, 8 percent of 
the enumeration districts in New York City were 
selected. Next, samples of each list were selected with 
sampling frequency proportional to the square root of the 
expected omission rate. Finally: 

(1) The lists were numbered from 1 to 10 ; 



156 

(2) The sample from the first list was guaranteed 
inclusion in the final list? 

(3) The sample from the second list was checked 
against the entire first list? and 

(4) For the remaining lists, the samples were chei ;e 
against all preceding lists, with duplicates 
removed. 

The final combination of lists produced a sample < : 
16,500 persons, representing a total population of 6, ! 
million, a large proportion of the approximately 7-8 
million residents of New York City. The Census Burec 
was ordered to match this list with that of the censi 
for New York City and determine the individuals who: 

(1) Were included in the 1980 census; 

(2) Were not living in New York on Census Day in 1 3 
and 

(3) Composed the remainder of the list* 

In performing the match, it was necessary to trace pe< > 
included on the composite list to their current addre; s 
was often necessary to determine to which of the abov< 
three categories each of the people belonged (details > 
the matching and tracing processes can be found in Bui 
of the Census, 1982d) . 

Due to several circumstances, among them the diffic 
of tracing people to current addresses, it was not alw ; 
possible to determine whether an individual had been 
counted in the 1980 census. Of the 6.2 million cases 
represented, 5.17 million were located in New York Cit 
on Census Day* The remaining 1.03 million represented 
cases that could not be determined. Of those located 
New York City, 4.75 million were determined to be couni 
in the census, and 0.42 million were determined to be 
missed by the 1980 census. Those persons missed repres 
8 percent of the cases for which a match status could 1 
clearly determined. Ericksen and Kadane believe a more 
reasonable estimate of the rate of omissions to be at 
least 10 percent due to the treatment of imputations an 
the likelihood of the undetermined cases to represent 
residents of New York City. 

Many of the difficulties encountered by this test of 
an administrative records-based coverage evaluation pro 
gram were specific to this application and are not due 
the general methodology. For example, much of the diff 



157 



:y in tracing people to current addresses was probably 
to the 28-30 month separation between the taking of 
census and the tracing operation* Ericksen and Kadane 
nt out their uneasiness about estimates derived from 
i of which 15 percent is missing (not at random) , and 
/ call for methods for reducing and accommodating 
sing data in this context. To this end they mention a 
sedure to replace hot-deck imputation of match status 
the undetermined cases that is closely related to 
stional matching, discussed in Chapter 8* Ericksen 
Kadane present various sources of evidence for the 
sonableness of their estimates. Finally, they recom- 
J that in future applications of this methodology a 
Her number of lists be used. 



5IDERATIONS FOR ASSESSING ALTERNATIVE COVERAGE 
LUATION METHODS 

assessing the merits of alternative methods of 
luating completeness of census coverage, a number of 
tors are involved: (1) the error profile for the 
lod, that is, the sources of error for each method, 
degree of error contributed by each source, and the 
slihood that corrective measures can reduce error; (2) 
sliness; (3) feasibility; (4) cost; (5) the extent to 
:h a method meets the needs for coverage estimates, 
. , the provision of small-area estimates; and (6) the 
int to which the various sources coincide to give a 
*rent picture. 

Clearly, an error profile that projects optimistically 
ird substantial improvement in the future is a consid- 
tion in the choice of coverage evaluation programs. 
:onsidering adjustment based on coverage evaluation 
i, it is also important to consider the development of 
irror profile for the census as well so that the two 
be compared. 

Considerations of timing are important as well. A fast 
[ram is particularly desirable if coverage evaluation 
:o be the basis for an adjustment of the census counts, 
.s also helpful for census users to have evaluation 
ilts available when they are making most use of the 
>us, i.e., in the first few years after the census. 
>nd some thresholds, timing is partly a function of 
staff resources and effort devoted to the coverage 
Luation program. For example, although preliminary 
ilts from the 1950 Post-Enumeration Survey were not 



158 

published until 1953 and final results not until 19 
the study results could undoubtedly have been made 
able sooner if there had been a perceived need, wit 
commitment of necessary resources* Micro-level co\j 
evaluation programs , however, have not been complet 
date within a shorter time span than about a year a 
half after the census. Preliminary demographic est 
for 1980 were released in mid-1981 and are still be 
revised on the basis of new data such as updated Me 
records. 

Cost is also a consideration. Rough estimates c 
for the censuses conducted from 1950 to 1970 (Burea 
the Census, 1978b:App.A, Tables 1-3) suggest that e 
tion programs of all kinds accounted for roughly 1. 
percent of total costs, with coverage evaluation pr 
making up perhaps about half the total devoted to e 
tion. In 1980, the Post-Enumeration Program cost a 
314 million, over 1 percent of the total. These pe 
centages certainly do not appear high considering t 
need to have good information about the accuracy of 
data. For 1990, the panel believes that increased 
resources should be devoted to coverage evaluation 
other adjustment-related programs, in accord with o 
recommendation to pursue vigorously a program of re 
and development that could lead to the use of cover 
evaluation results for adjustment of the counts (se 
Chapter 7). We have indicated ways in which cost s 
could be achieved in other areas (see Chapters 5 an 
so as not to increase total costs of taking the cen 

It is also vital that coverage evaluation progra: 
provide information at subnational levels of aggreg 
because of the concern about the equitable distribu 
of political power and monies. Finally, a way in w 
coverage evaluation method can gain acceptance is f 
to provide estimates that nearly coincide with esti 
given by other coverage evaluation programs based o 
substantially different statistical models. 



ERROR PROFILES FOR COVERAGE EVALUATION METHODS 

An error profile of a survey is constructed by firs 
creating a systematic and comprehensive list of the 
operations that lead to the survey results. The er 
profile is then a description of the potential sour< 
error for each operation, the information available 



159 

each, and their impact on the survey estimates of 
interest (Bailar, 1983a? Brooks and Bailar, 1978). 

It is not a simple matter to make an exhaustive list 
of the operations of a complex survey. It is even less 
simple to make a list of the potential sources of error 9 
particularly in the case of a new measurement approach or 
an adaptation from another field. However, it is often 
true that one can identify dominant sources of error ones 
whose reduction would materially improve the reliability 
of final estimates (even if one cannot exactly quantify 
by how much) . It is on these dominant error sources that 
one must concentrate resources available for improvements 
since, without reducing them, even major error reduction 
in other components would lead to only negligible improve- 
ments in the reliability of the final product. Chapter 8 
makes several recommendations in this direction. 

The preceding sections of Chapter 4 have provided an 
overview of the development of coverage evaluation methods 
during the past 40 years. The questions remain, what is 
the current state of knowledge as to the efficacy of the 
various methods, and what are the expectations for im- 
provements in each of the various methods over the next 
few years? It is not possible to provide a direct answer 
to these questions, since research is still at an early 
stage for many of the techniques. However, by way of 
summary we present the various operations central to 
coverage evaluation and a sense of the magnitude of the 
problem faced in accomplishing each of the operations by 
the various techniques. 

The operations central to coverage evaluation are: 

Direct Estimates 

Developing the completeness of the coverage 

evaluation survey 

Obtaining complete response in survey and census 

Tracing 

Geocoding 

Matching 

Dual-system estimation independence 

Evaluating the evaluation 

Demographic Analysis 

Obtaining component data 

Undocumented aliens 

Completeness of birth registration 

Completeness of death registration 



160 

Legal immigration and emigration 
Internal migration 
Evaluating the evaluation 

We now address, in turn, each of these operations a d 
discuss qualitatively the resulting impact that eac 
operation would have on an error profile for the va iou 
coverage evaluation techniques. 



Components of Direct Estimates Programs 

Developing the Completeness of the Coverage Evali tic 
Survey 

Incompleteness of coverage results from the fact ha1 
no list, whether based on surveys, administrative re Drc 
or a combination, includes the proper representation of 
the target population in the present case every res ter 
of the United States. Some groups have a history of 
being significantly missed in post-enumeration surve; 3, 
for example, undocumented aliens. People in urban a 5as 
with very low socioeconomic status are missed by pos* 
enumeration surveys and censuses (Valentine and Valei :ii 
1971) . Although reverse record checks omit some of i e 
persons missed in the previous census, undercover age .s 
probably not as severe with this method as with post- 
enumeration surveys. However, the problems reverse i cc 
checks have in counting undocumented aliens are proba 1} 
as serious as those of post-enumeration surveys. Adir nd 
trative list methods may have still less of a problem 3i 
undercoverage (although other problems exist, includi g 
the fact that gaps in coverage cannot be quantified i 
terms of age-sex-race groups or other census characte U 
tics) . 



Obtaining Complete Response in Survey and Census 

Incompleteness of response results from the inabili 3 
to obtain survey forms from some individuals or househ 3 
on the master address register or from the inability t 
collect answers to specific questions: for example, 
unreported or incomplete address was a major cause of 
unresolved matches in the 1980 PEP (see Cowan and Bett 
1982). Of the 8.5 percent unresolved matches in the 
April P sample, over 60 percent were due either to CPS 



161 

Dnresponse or to failure to complete a follow-up inter- 
iew. For reverse record checks, nonresponse is less 
erious per se. However, the inability to trace a frac- 
ion of the selected population has an effect that is 
ompletely analogous to nonresponse. Nonresponse will 
learly affect any matching involved in methods based on 
Sministrative lists. 



Tracing 

Tracing is the process of searching, used in the 
averse record check, for a present residence address, 
iven information about a residence some time in the 
ast. As mentioned earlier, roughly 12 percent of the 
ample in the 1960 test of a reverse record check in the 
nited States could not be traced successfully. The use 
f more intensive tracing methods, as tested in the 
Drward Trace Study (see Chapter 8) , may reduce the rate 
f failure to trace. (In Chapter 8 we mention that 
itensive tracing may increase the sensitization of the 
opulation.) Recent use of the reverse record check in 
anada has been accomplished with a fairly consistent 
racing failure rate of 5 percent. Canada has the advan- 
age, compared with the United States, that addresses 
sed for tracing are at most five years out of date. The 
racing failure in 1960 in the United States was somewhat 
igher than the PEP unresolved match rate in 1980; the 
anadian tracing failure rate is lower than the 1980 PEP 
nresolved match rate. Post-enumeration surveys and 
iministrative record techniques do not make use of 
racing in estimating the undercount. 



Geocoding 

Geocoding is the determination of census geography, 
,e., census enumeration district, for the address of a 
2cord included in the evaluation program. Relative to 
>nresponse, geocoding does not appear to be a major 
roblem for the PEP. In 1980, incomplete or impossible 
jocodings, when the address was complete, were respon- 
Lble for only a small part of the unresolved matches. 
>wever, geocoding (more generally geographic errors) 
ffected PEP errors in a more subtle manner: by increas- 
ig the apparent gross over enumeration and under enumera- 
Lon levels and thus putting a greater strain on the 



162 

complex estimation method, cancelling out large offsettin 
errors. In 1990, the Census Bureau is planning not to 
use geocoding as a preliminary step prior to matching, 
but rather to use the address as an alphanumeric string 
that can be matched directly (see Jaro, 1985) . The 
reverse record check does not depend on a refined level 
of geocoding. Matching administrative lists to the 
census depends on geocoding to the same extent as a 
post-enumeration survey. 



Matching 

Matching is statistical-based inference as to when 
information for two records on one or two lists does or 
does not represent information for the same individual. 
Matching problems, given complete information, were the 
cause of only a small percentage of the unresolved 
matches in the 1980 PEP. For reverse record checks, 
matching is not intrinsically required to estimate net 
undercover age. However, matching is used in the pre- 
liminary efforts at tracing. Multiple and composite list 
techniques are, of course, exceedingly dependent on 
matching. Matching administrative lists, particularly 
local lists, is especially troublesome as the number of 
lists increases, due not only to the increased computa- 
tion but also to the likely incompatibility of many of 
the reporting and coding practices for these lists. 



Dual-System Estimations Independence and Equal 
Capture Probabilities 

The problems with the assumption of independence for 
the inclusion of people in the census and the evaluation 
study list and the assumption of equal capture probabili- 
ties are related to undercoverage problems discussed 
above. The estimation of undercoverage when using dual- 
system estimation is believed to be quite sensitive to 
these assumptions. Information on non independence for 
certain demographic groups (see Kadane and Ericksen, 1985) 
has indicated the possibility that the undercoverage 
rates may be higher than those estimated assuming indepen- 
dence (see Chapter 8). Administrative lists have the 
same sensitivity to the failure of these two assumptions. 
Reverse record checks do not use dual-system estimation 
to provide estimates of undercoverage, since the underly- 



163 

ing assumption is that the frame used for the reverse 
record check sample is essentially complete. 



Balancing of Overcount and Undercount 

The difficulty of balancing over- and undercount, 
faced by post- enumeration programs that do not sample 
compact area clusters such as city blocks, has as yet an 
unknown impact on coverage estimates. The difficulty 
arises because two different survey vehicles (the P and 
the E samples in the 1980 PEP) must exactly balance with 
respect to their estimates (confounded with other 
estimates) of the number of census enumerations assigned 
to the wrong geographic area. Since net undercount is 
estimated as a residual by subtracting the E sample 
estimates from the P sample estimates, and since both are 
substantially higher than the net undercount, the latter 
may be significantly affected by the supposedly offsetting 
errors not balancing one another. The reverse record 
check is not subject to this source of error. 



Components of Demographic Analysis 

Obtaining Data for Undocumented Aliens 

This component was clearly the most difficult one for 
demographic analysis in the 1980 census. Since uncer- 
tainty in the number of undocumented aliens in the United 
States is at least a million or so either way, a major 
improvement in the precision of the measures of this 
component of the population is necessary before demo- 
graphic analysis can be comfortably used in the future. 

Completeness of Birth Registration 

As time passes, an ever-increasing proportion of the 
population has been born in the period during which birth 
registration has been essentially complete. For the 1990 
census, reliable information on births will be available 
for most residents 55 or younger, nearly 80 percent of 
all residents. Since those age 65 and over will be 
covered by Medicare information, a fairly small group 
will remain with less reliable sources of registration. 
Furthermore, the development of models of the complete- 



164 



ness of birth registration is proceeding. This 
component in need of great attention. 



Completeness of Death Registration 

Death registration is virtually complete and < 
minor contribution to the error in demographic ar 



Legal Immigration and Emigration 

Data on immigration are subject to problems, 
the most problematic component is emigration, for 
there is as yet no direct estimate. Relative to 
components , legal migration is probably subject t 
error than birth and death records, but to less < 
undocumented immigration. An interesting by-proc 
the reverse record check is the possibility of a 
estimate of emigration. 



Internal Migration 

Censuses have obtained data on interstate mob: 
but their quality is generally considered inadeqi 
use in coverage estimates. This precludes the us 
these data for the development of subnational est 
of undercoverage. 



Evaluating the Evaluation 

To better determine where the difficulties lie ii 
application of coverage evaluation programs, espc 
demographic analysis, there is a need to develop 
of variability of the estimates of undercoverage. 
tivity analyses should also be carried out to del 
the effect of various assumptions, input data qua 
etc., with respect to the resulting estimates, i 
is discussed further in Chapter 7. 

When one is considering the major sources of < 
coverage evaluation programs, it is necessary to 
the results with an investigation of the errors j 
in the census. Coverage evaluation programs are 
to measure errors of the census, and their resull 
used by the Census Bureau to understand the erroi 



165 

that action can be taken to make improvements. Chapter 
presents information gathered from the various methods 
used to evaluate the coverage of the decennial censuses 
on differential rates of undercount and overcount among 
groups in the population. 



APPENDIX 4.1 

AN INTRODUCTION TO ESTIMATION FROM MULTIPLE LISTS 
INTRODUCTION 

Government administration generates large lists maint 
for various purposes. Examples include social securi 
registration lists, Medicare eligibility lists, and t 
Internal Revenue Service individual income tax return 
files. Local governmental bodies have other files, e 
school enrollment lists, lists of driver's licenses, . 
voter registration lists. These are referred to as 
administrative records, which have been put forth as 
having potential for improving the decennial census. 

Although other applications have been suggested (s< 
Alvey and Scheuren, 1982) , this appendix deals with u 
administrative records, or multiple lists, as part of 
estimation process to help determine the "true" popul 
count. The estimated count can then be used for cove 
evaluation and also, of course, for adjustment, as su 
gested by Ericksen and Kadane (1985) . Estimation fro 
multiple lists is performed through the use of list 
matching, i.e., the determination of when information 
two individuals on the same or different lists is in 
information for the same individual. Appendix 4.2 di 
cusses the operational techniques involved in list 
matching. This appendix is mainly concerned with the 
statistical estimation models used in conjunction wit 
list matching, their definition, purpose, and advanta* 
and disadvantages. 

An important advantage in using an administrative 
record match for coverage evaluation is that it does : 
rely on a household survey or a previous census. AS 
indicated earlier, surveys tend to miss many of the s 
people as censuses. A second advantage is the possib 
of focusing more sharply on the kinds of people most 
likely to be missed, by including such lists as AFDC 
recipients, or those collecting unemployment insurano 

For the purposes of this discussion, it is conveni 
to assume that the matching has been carried out per- 
fectly, that is, every list contains only correct inf 
tion for its members, especially the address, every 1 
in use has been unduplicated correctly, every list ha 
been purged of erroneous enumerations, there are no f 
matches, and there are no false nonmatches. 



166 



167 
TABLE 4.4 Two- Way Table Underlying Simple Dual-System Estimation 





Caught in the 
Other List 


Missed in the 
Other List 


Population 
Total 


Caught in the census 
Missed in the census 
Population total 


nn 
ri + i 


n io 
n oo 


n o+ 



DUAL-SYSTEM ESTIMATION 

The basic model used in estimating the total population 
from two incomplete lists is called dual-system estima- 
tion. (As Chapter 4 details , in other contexts it is 
referred to as capture-recapture estimation.) The 
process may be stated as follows. Assume that one has 
two lists of a population, one from the census, and it 
matters little if the other list is a sample from a 
(possibly theoretical) parent list. Given that one has 
observed the total number of individuals on each list, 
and the number of individuals jointly on the two lists by 
matching, how can the remaining members of the population 
be estimated? These quantities lead to four elements of 
a two-by-two contingency table, given in Table 4.4. The 
resulting question is how one estimates the size of the 
n0Q cell. (For purposes of this discussion, the 
important assumption of equal probabilities on the two 
lists (or more) will be used. Its impact on the estimate 
is discussed above briefly in Chapter 4.) 

In order to estimate the size of the nQQ cell, or 
equivalently the n++ cell, it is necessary to assume a 
model relating the entries of the cells. An assumption 
that is commonly made is that the mechanisms for inclu- 
sion in the two lists operate independently, that is, 
that inclusion in the census given inclusion in the other 
list is equal to the probability of inclusion in the 
census, and that this probability is the same for 
everyone. It follows that: 



! n l4 /n++ (4.1) 

or equivalently: 

n ll/ n 01 = niO/^O* 
Therefore, an obvious estimate of n++ is 



168 

More generally we can parametrically specify i ie 
relationship between the two ratios in equation ' .1: 



n 13 /n +1 - k s (n l4 /n + +) (4 2) 

or equivalentlys 



In this formulation, it is first necessary to est mate k 1 
or k ff in order to calculate n.f+ or n00. When k, 
sometimes referred to as the cross-product ratio, is 
determined or believed not to equal 1, the mechan sm for 
inclusion in the two lists is said to have "corre: ation 
bias." We discuss correlation bias and the param ter k 
more fully in Chapter 8. 

Many modelers have assumed that k equals l f i. ., 
independence of the two processes, although the v .idity 
of that assumption is usually uncertain. One met! >d of 
increasing the intuitive strength of the independ ice 
assumption is to stratify the population of one oJ the 
lists and to use dual-system estimation separately on 
each subpopulation. This can be a powerful tool i >r 
lessening the impact of dependence (see Marks et a . , 
1977). However, stratification assumes that one i; der- 
stands the mechanism for being missed on each list and 
knows what modes of stratification are appropriate 



OTHER ESTIMATION APPROACHES 

There are a number of alternative methods of estim tion. 
One approach is to obtain and then merge several 1: sts, 
then match the resulting composite list to the cen as. 
One would then use dual-system estimation on the r< suiting 
two-by-two table, assuming that k equaled 1 for th s 
two-by-two table. The merged list is likely to be >oth 
more complete and more representative of the popul< :ion 
than any of the individual lists, and the values oi n^o 
and n0o will be smaller. (By representative we me i 
that no identifiable population, e.g., demographic froup 
or persons residing in a specific region of the cou try, 
is missed relatively more often by the composite 1J it 
than any other identifiable population.) 

One can effectively merge a sample of the noncer us 
lists by sequentially matching them and then matchi g the 
merged list with the census. Kadane and Lehoczky ( 976) 



169 

TABLE 4.5 Multiple List Quantities for the Case of Two Lists and a Census: 
Three- Way Contingency Table 

Included Missed 
in Census in Census 



Included in second sample n l { l 

Missed in second sample nn 

Total n n + 

Included in second sample n 10 i 

Missed in second sample n^ 



Total 



n io+ 



Included in second sample n l + l 
Missed in second sample n 1+0 

Total n, 4. ,. 



noil 
n oio 



n ooo 
n oo + 

n o+o 



n+ 10 Included in first sample 



+0 ? Missed in first sample 



n++0 Total 



NOTE: In this table, each subscript indicates inclusion (1) or exclusion (0) in each of three lists, 
representing two lists and the census. 



present an efficient method for ordering the lists to be 
sequentially matched. Ericksen and Kadane (1983) applied 
this technique to prepare their estimate of the proportion 
of New York City's population that was undercounted in 
1980. 

Another approach/ which we call multilist, is to match 
completely all the pairs of lists , including the census. 
If there are j noncensus lists, this requires (j+l)j/2 
total matches. If one is willing and has the resources 
to perform all these matches, then one has many estimates 
of the number of people missed by all the lists from 
which to choose. The resulting mathematical structure is 
a 23"' contingency table (see Table 4.5 for the notation 
for three lists) . 

Some estimates of n0oo are (see Marks et al., 1977): 



or 
or 
or 
or 



or 
or 
or 
or 



(4.3) 



and various combinations of these nine estimates. The 
use of the first six of these estimates individually 
implies something about a parameter similar to the k and 
k 1 above being set equal to 1, but these new parameters 



170 

relate to conditional independence assumptions. 
sometimes believed that these assumptions are mo 
istic or less sensitive to failure than the unco: 
assumptions made with only two lists. Furthermo 
examining other complete subtables, that is, sub 
not involving the missing cell nggQ, various ind 
dence and dependence assumptions about the incom 
subtables can be checked, in a weak sense* The 
three estimates given in equation 4.3 relate to : 
tables that are equivalent to the pre-merging of 
the three lists. At any rate, if ngoo * s smalle 
n would be, this alone would make the estimati 
n -H-+ less sensitive to assumptions . 

A third approach, offered by Cormack (1981) a 
et al. (1984) , is to use covariate information i 
the probability of an individuals capture in ea 
In these papers, the usual log-linear parameter i 
the 23* 1 contingency table resulting from the co 
j~way match is replaced by a parameterization re 
for example, the probability of being captured i 
list given that the individual was captured in t 
list. This gives one a different interpretation 
missing cell, whose entry is now a function of t 
parameters estimated from the nonmissing portion 
table* 

There is finally a need in any discussion of 
list estimation methods to point out the hazards 
multiple list matching. The grave disadvantage 
multiple list matching is that successful match! 
assumes a level of administrative sophistication 
unlikely to be the case in practice. Although i 
it seems easy to match individuals between lists 
practice it is far from easy to avoid false mate 
false nonmatches, and ambiguous matches that may 
substantial impact on results. All administrati 
contain errors resulting from intentional misrep 
unintentional misreporting, or data entry imperf 
furthermore, records in administrative lists hav 
ing reference dates, and revision practices will 
between lists and between individuals within lis 
inability of the 1980 PEP to match more than 92 
of individuals between the 1980 census and the A 
CPS despite almost exactly equivalent reference 
extensive follow-up investigations is illustrati 
problems involved. These problems are likely to 
more severe with the use of administrative lists 
tained for other purposes with different record- 
priorities. Some of these issues are discussed 



APPENDIX 4.2 
OPERATIONAL ASPECTS AND MODELING OF COMPUTER MATCHING 



This appendix describes the physical operations and 
statistical models underlying the process of matching two 
lists (or unduplicating one file) * The statistical models 
discussed relate to a different issue from the one ad- 
dressed in Appendix 4.1 when the models are to be used 
after the matching has been completed * The models dis- 
cussed here provide one with an objective means of 
deciding which records to link as well as estimates of 
the resulting error rates . A good overview of the 
subject of matching is presented in Federal Committee on 
Statistical Methodology (1980) . 



OPERATIONAL DIFFICULTIES OF MATCHING 

Before proceeding, in order to appreciate the overall 
cost structure of matching, it is helpful to understand 
the social and operational obstacles to the use of 
administrative records in list matching. A major concern 
is the public reaction to any possible encroachment on 
the confidentiality and privacy of their individual 
records. In the interests of satisfying respondent 
concerns over the privacy of the census questionnaires, 
neither names nor addresses of respondents, except for a 
small sample of households involved in a few components 
of the coverage evaluation program, were recorded in 
machine-readable form in the 1980 decennial census. 
Thus, in 1980, the Census Bureau was able to assert to 
uneasy respondents that its files could not be used for 
large-scale government record linkage, in 1990, the 
current plans of the Census Bureau are to capture in 
machine-readable form both name and address for input 
into a computer matching algorithm. However, the 
potential for actual invasion of privacy is not very 
great in coverage evaluation programs, since the number 
of persons involved is such a small percentage of the 
total population. 

Operationally, there are serious problems in list 
matching administrative records. First, the quality of 
the information on many lists is not good. There are 
duplications, erroneous entries, and missing data. For 

171 



172 

example, the addresses on social security files exc 
for beneficiaries, are not current. In addition , tl 
more lists that are merged, the higher the chanc t oi 
significant undetectable duplication on the mer< >d ] 
which, when matched against the census, will inf ate 
estimated undercount. 

Many lists, for example the census itself, dc not 
an identification number, such as the social sec rit 
number, to facilitate matching. Without such an ide 
fication number, matching must make use of such ten 
name, address, age (possibly birth date), sex, a 3 i 
Unfortunately, addresses in administrative recor< 5 a 
frequently not the residence address required fo a 
match but a mailing address or an obsolete addres ;. 
Finally, the combination of the above complicate is 
usually results in a large percentage of cases fc w 
match status cannot be resolved. The treatment c : t 
unresolved cases can have a substantial impact or th 
resulting undercoverage estimates. 



GENERAL ALGORITHM OF MATCHING 

The general algorithm underlying matching is base c 
similarity of the information for two individuals m 
comparison in two files. Every record from one o t 
two files is assigned a or a 1, denoting, say, i IT 
or matched. (Later on, we will see that there is ir 
important third class that is assigned a "do not k o 
status.) Thus, two candidates for matching are ic r 
tified. If their characteristics, for example, na e 
address, sex, age, race, and birth date, are the s n 
close, the algorithm will assign them a "1" for ma : 
status. In the absence of a single error-free ide t 
cation number, the more discriminating the matchin 
characteristics (e.g., birth date as opposed to se ) 
fewer the false matches? and the more error-free tl 3 
characteristics, the fewer the false nonmatches. ' i 
to matching well is weighting the similarity or dis 
larity of the characteristics information so that 3 
individuals are misclassified. 



BLOCKING 

Matching, when one of the lists is as extensive as 
decennial census, and even for considerably smaller 



173 

lists, cannot possibly be done by searching the entire 
large file for each record on the other list. Therefore, 
the search is limited to a likely subset of the census 
file. To define this subset, one or more characteristics 
are chosen, then equivalence classes or blocks of re- 
sponses for these characteristics are identified, which 
have the property that when two records are located in 
different blocks, the likelihood of a match is considered 
to be quite small. When such a variable, or variables, 
are found, the large file (the decennial census file) is 
restructured so that only those records that are located 
in the same block are checked for possible matches. 
Blocking necessarily involves a trade-off: incurring 
some number of false nonmatches in order to reduce costs 
by a significant margin. 

The question of how to block effectively has a statis- 
tical framework initially articulated by Fellegi and 
Sunter (1969) . Kelley (1984b) has also investigated this 
issue. 

In 1980, in the Post-Enumeration Program, the blocking 
used was geographic. Possible matches were searched for 
only in the same enumeration district as that of the 
address of the Current Population Survey interviewee. 
This was necessary due to the limitations of a clerical 
match. However, the Census Bureau is developing and 
testing sophisticated software to reduce, possibly to 
under 40 percent, the percentage of the matching that 
will need to be handled clerically in 1990. This auto- 
mation of the matching process should also allow less 
restrictive blocking to be used. 



VARIABLE SELECTION 

The selection of identifying variables available in both 
files can be an important aspect of matching algorithms. 
However, in the particular case of the coverage evalua- 
tion programs for the decennial census, variable selection 
is more or less predetermined. This is because matching 
requires the use of short-form information only, such as 
name, address, age, sex, and race. There are not a great 
many more variables on the short form to choose from that 
would help determine match status. When one has more 
latitude, variable selection is driven by two considera- 
tions, as mentioned above: (1) the quality of the 
response for that variable and (2) the discriminating 
power of the variable. 



174 
A MATHEMATICAL MODEL FOR RECORD LINKAGE 

Feilegi and Sunter (1969) developed a mathematical mode 
for matching, or record linkage, which the Census Burea 
is using in its software system for automated matching. 
A brief description follows. 

First, assume that one wishes to match files A and B 
The ordered pairs of these two files, A X B r is the 
disjoint union of two sets: 

M - [ (a,b) ? a=b, aeA, beB] (the truly matched set) 
and U [ (a, b) ; a?b, aeA, beB] (the truly unmatched set 

In order to determine the match status of any two recor< 
it is necessary to compare their set of characteristics 
To do this, a comparison vector is created, which is a 
vector function of the records, x(a) and x(b), more pre- 
cisely, of the common identification information contaii 
in the records. This comparison vector is written: 

g[x(a),x(b)] { g 1 [x(a) ,x(b)] ,.. . ,g k [x(a) ,x(b)] }.. 

Typically, g i [x(a) ,x(b) ] is the coded result of the 
nature of agreement or disagreement between the ith 
identifying variable (e.g., age) of the two records. Al 
matching inferences about the records (a,b) are made on 
the basis of the vector g, which codifies the pattern an 
nature of agreement and disagreement between the pair of 
records a and b. Three possible decisions can be made 
based on an examination of the vector function g: (1) 
decide to link a and b, (2) decide not to link a and b, 
or (3) choose not to make a decision. This decision 
rule, which we will denote as d(g), is a function of the 
comparison vector. 

Now, given that one is taking a set of random pairs 
from A X B, one must consider the likelihood of observin 
the comparison vector g given, respectively, that the 
pair (a,b) came from M or came from U. These are writte 

m(g) * P{ g[x(a),x(b)] I (a,b) eM} 
and 

u(g) P{ g[x(a) ,x(b)] I (a,b) e U} . 



two types of errors associated with a linkage rule 
are: (1) deciding to link unmatched individuals and (2) 



175 

deciding not to link matched individuals. Setting these 
two errors respectively to e^ and 62* Fellegi and 
Sunter defined a decision rule d(g) as optimal if, given 
all rules that have errors of type 1 less than or equal 
to eif and type 2 less than or equal to 62 as above , 
the probability of making no decision is smallest. 

The above problem was shown by Fellegi and Sunter to 
yield an optimal decision rule, which, roughly, decides 
to link all the cases in which the ratio m(g)/u(g) is 
high and not to link all the cases in which m(g)/u(g) is 
low. This is equivalent to a likelihood ratio test. The 
two threshold levels (above which to link and below which 
not to link) are determined to yield the two types of 
error, e^ and 62* at prespecified levels. 

The difficulty that remains is that of estimating m(g) 
and u(g) . Fellegi and Sunter provide two methods for 
accomplishing this. One might have prior knowledge from 
other studies of the errors to which variables involved 
in linkage are subject. The errors might also be 
estimated on the basis of comparing a small sample of 
records that are known to match and observing the lack of 
agreement in the identifying variables (see Arellano et 
al., 1984). Iterative procedures are also conceivable in 
which initial match status is determined, which then 
provides more refined estimates of m(g) and u(g). 
Finally, we note that records that cannot be matched at 
the level of predetermined error rates e^ and 62 have 
to be followed-up either through clerical matching or by 
obtaining additional identifying information for them. 



5 

Taking the Census I: 
Improving the Count 



The charge to the Panel on Decennial Census Method log; 
called for investigation of methods of conducting tie 
decennial census that could prove more cost-effect ve 
than the methodology used in 1980. The 1980 metho D!O< 
as described in Chapter 3, included numerous progr ms 
designed to improve coverage in hard-to-count area an< 
of hard-to-count populations and stipulated that a 1 
follow-up and coverage improvement operations be c cri< 
out as completely as possible. The panel was aske to 
consider possible alternative methodologies, for e amp: 
a methodology that would incorporate adjustment fo 
coverage and content errors. Adjustment, if appro; ria 
methods can be developed and implemented, might no on 
increase accuracy but also lessen costs by leading to 
decision to give somewhat less emphasis to cover ag 
improvement programs during the conduct of the cen: is. 
Similarly, the panel was asked to consider the use of 
sampling for the count and of administrative recor< s a 
means of reducing costs compared with the 1980 
methodology. 

Most programs directed toward coverage improvem it 
expensive, both in absolute terms and often in ten 
the cost per person or housing unit identified and 
to the census. Moreover, some coverage improvement 
grams as well as other census procedures may have 
duced some overcounts in 1980 by duplicating persoi 
otherwise erroneously adding persons. In general, low 
ever, the panel believes that the costs of well-des .gn< 
and well-executed coverage improvement programs re] :es 
money well spent for improving the count. The pam ., 
the beginning of its work, identified as a key issi i t 
of reviewing coverage improvement methods with the >ur] 
of identifying particularly promising approaches tl it 

176 



177 

should be part of the methodology for conducting the 
enumeration. 

This chapter begins by summarizing the literature on 
what is known about the characteristics of hard-to-count 
areas and groups in the population to provide the neces- 
sary background for evaluating the cost-effectiveness of 
coverage improvement programs. The section also sum- 
marizes what is known about the problem of over count ing . 
(Appendix 5.1 provides a more detailed review of the 
literature on undercounting and over count ing.) 

The chapter then reviews the history of efforts 
directed specifically toward coverage improvement in both 
the 1970 and 1980 censuses and the Census Bureau's plans 
for testing coverage improvement methods for 1990. 
Finally, the chapter presents the panel's recommendations 
for priority areas for research and testing with regard 
to coverage improvement. 



HARD-TO-COUNT GROUPS IN THE CENSUS: WHAT IS KNOWN 
Experience in 1980 

Evaluation studies of the completeness and accuracy 
achieved in the 1980 census are still in progress. 
Estimates published to date, based on the method of 
demographic analysis , show the rate of net undercount for 
the total population in the range of 0.5 to 1.4 percent, 
depending on the estimate of the number of resident 
undocumented aliens in the country (see Table 5.1). The 
highest net undercount rate estimated by demographic 
analysis for 1980 (1.4 percent) is about three-fifths of 
the rate estimated for 1970 (2.2 percent) and only two- 
fifths of the 1950 rate (3.3 percent). The differential 
rate of undercoverage between the black population and 
all others has narrowed somewhat for the nation as a 
whole, as the table shows. The differential in 1980 of 
5.5 percentage points between net undercount rates for 
blacks and all other legal residents is about three- 
fourths of the 1950 differential of 7.2 percentage 
points. However, the 1980 differential is over 90 
percent of the 1960 and 1970 differentials of about 6 
percentage points, and most of the gain achieved by 1980 
in narrowing the differential resulted from better 
coverage of black women and not black men* 

Rates of gross and net undercount in 1980 varied by 
population group and by geographic area, with rates 




"1 :;.!> Ci:ap( lie anali'SLS ^r ";vii"ie;; i""Jt:>OiKlenc estimate* 
the* national popuIatJofi 1/y atje r vaco y ard sex that, wt 
compared with tLe census couijts l.ur thest-j categories, 



\ 



in 



^ cH . , race, 

t K M , 



i as J v n i - , < < t r f c f c. 
i , ' ; _ u > ^ j , i C" I L ; i. 

' ! MIL " j 



1 1 I " < J 1 t 

< ' ' I i' , t > 1 

< h / L ,1*1 y . I M 

i i r ( r ! wo * i ii, 



f/f, 
^ ;r a 



UJ. Cl 

"*tr t 




L v r - , 

* r i j 6 
/ -"I G u 



SA'O'j' if a* u /-. ^~ >^ f o w 2 ^o'^idc 

e tier ror ^:>pu" u i. ,1 'j' ^ cl" i ' i n rV* 

C i"Cd- ca . o ^ecc ic. c ^ T ^ i v <w^ ^iJL v O'j 

-^o* TIG r " FI*. * L l j -,f r-" 1 - i - , r ^ pen 

1L*^tT^ ctf i'/il. ^ -x i. -J - " ' r ^- ^J ' 

r *=- ^C t^ t *r d t - _ /t t UP t r L. i v' r * U; ne < 

Jl -^ i o i 1 Jr^i >9> -^, , i " . - r i ' u tae-ta 

hyr^u^t, ic^ie^i ~^^ l c , * hu ^ ci ie o^ei 

* r t i L ! i J* i i I H~1 ^ ' ^ * ] j " * 1 L i-' T> 



T 'l ^iLLr^i! ' f '_JL c J: 8 _ ^ 1 

' ! r n c f r*-, jr <. cu J i O L i r M t j i , ^ , ^ J mi, i< 

i* ,i -. J < ' ^i ti - , j] a ^ ^ *. i i f f M t: 

.JL^^'Gc*' 11 . i r ^ u < "t <" If" I i ' J 1 ? i 

it 11 ! i 1 i. L *-C t i L v ir i< I \M'O. 

^ i- * r i f i * , l if if >. 



f ' a . ^ > t ^ '^^ A ' j 1 < ' ^ 

> i* v ! - i j t- L ^ i * i - L i uiue. 

* j^ aLc^ c . bat ' ue c^ ' / ^ ^ i - L, i ^tuatiO] 

1 1 t 1 01 c . - t ri- j i o ur s> - ' i j i u Irii J y h, 

CUJPI a^iu r cu i /tf . , ^*_/ i & ^ p^ L. w*' r^ fiore *.'ktJy i 

uvci counted , it j.^ c it g r _^ c i. s i^n . nd j j-s ove 



! c v, j > ' "1 f * r n 1 j 

il ) i - i <. * ' t d i . " O sr L * ' =r ^ " ^ L 

/ i . < r , t Jf r . ,oi - ' J ' )" 

f ' '<'<!<--< r mi- 1^.1 _ i i , i -v x - idi'ie 1 ' i i n 

L o , r - ' i r ? ^i- " < i t\ 1 f/r - 1 ff * . f tc^ 1 , 1 r 5 r \ * 

!! i 1 l J <! ! V- SK | i ^O'uflt /* "? C * / i^- L i Pw t J 

Li*rc i f 'I? i" , J CtJv. - . P A^/ ItfU t O 



c r l I TIC " t > f ' ^ t < j ii ' i* u or / 

t 1 ' , U t 4 v f OIL . t >*J Oi | "- < 

f ' l 1 o ^ 4 > s , >r uf rl ^ i c i < } j 

( I M * * f i PI J J <> I) utl it <" tx' i 

t >- , / II ^ f 1 / 1 1 i ..' -, i < T i I H 

ill '"i ! f ct r^ . 

f iii in i > " I cit i o^i i*v L uj jc * i * toi 

i'j < ht u,) ' fjti : Kin tat .* io< hKR'k. k 

1 ! ' ! S 1 ! " i! i ( JTi l f ? , > C I \ >= ' r (8 



182 

cities of large standard metropolitan statistical ai as 
(SMSAs with 3 million or more population) had a mod< -ate 
high gross omission rate compared with the average. Are 
enumerated using conventional techniques rather thai a 
mailout-mailback approach had a below-average rate* 

(4) Cross-classifying ethnicity and type of plac by 
the mail nonreturn rate for the district office (tha is 
100 percent minus the percentage rate at which quest on- 
naires were mailed back from households) produced st iki: 
differences in gross omission rates. Blacks and His ani 
in district offices with mail nonreturn rates of 30 er- 
cent or higher exhibited gross omission rates more t an 
three times the average, while the gross omission ra e 
for blacks in district offices with mail nonreturn r tes 
of under 15 percent was only moderately above the av rag 
rate and the gross omission rate for the correspond! g 
group of Hispanics was close to the average. Simila Ly 
central cities of both large and small SMSAs with ma 1 
nonreturn rates of 35 percent or higher had gross om ssi< 
rates more than three times the average, while those 
cities with mail nonreturn rates below 10 percent ha 
below-average rates. 

Mail nonreturn rate appears to be a good indicate of 
gross omissions. Of course, the mail nonreturn rate s < 
symptom and not a cause of various problems pertainii | t< 
an area that result in higher-than-average rates of < ds- 
sions (including not only the unwillingness of persoi ; t< 
be counted but also problems related to census procec rei 
such as difficulty in delivering mail to individual 1 msi 
holds in some multiunit structures) . Nevertheless, t e 
mail nonreturn rate appears to provide valuable infoi ia- 
tion to locate geographic areas in which coverage is 
particularly difficult. Further research on the chai es- 
ter istics of areas with high mail nonreturn rates tha 
could assist development of effective coverage imprcw raei 
techniques is hampered by the small sample sizes in t e 
PEP for these areas. Moreover, at present, informati n 
on socioeconomic characteristics of the nonmatched PE 
cases for example, income and occupation that might De 
useful to examine along with demographic and geograph : 
characteristics is not in a ready form for analysis a 
the Census Bureau. (Fellegi, 1980a, provides estimab s 
by a broad range of characteristics for persons misse ii 
the 1976 census in Canada, as estimated by the revers 
record check methodology.) 



183 

As already noted, the whole story regarding coverage 
problems in the census does not emerge solely by looking 
at gross omissions. In every census, some persons and 
housing units are counted more than once or are otherwise 
erroneously included (for example, via "curbs toning" or 
counting as an occupied unit one that was actually vacant 
on Census Day) . The phenomenon of overenumeration may 
have more to do with census procedures , for example, 
quality control of the address list, than with the 
propensities of persons to be counted? nevertheless, it 
is necessary to examine gross over numerations as well as 
omissions to obtain a complete picture. 

With regard to gross over enumerations, the PEP results 
indicate the following patterns: 

(1) Population groups with relatively high gross 
omission rates also tended to have relatively high rates 
of gross overenumerations . However, the dispersion in 
gross overenumeration rates was less than the dispersion 
in gross omission rates. 

(2) By ethnicity categories, blacks, most Hispanics, 
and members of other nonwhite races had gross over- 
enumeration rates moderately above the average. By house- 
hold relationship, persons not related to the household 
head and relatives other than parent, child, or spouse 
had moderately high gross overenumeration rates relative 
to the average. 

(3) Gross overenumeration rates also varied by type 
of enumeration procedure. Within mailout-mailback areas, 
enumerations obtained through follow-up for nonresponse 
exhibited a rate of gross overenumerations more than twice 
the average rate, while enumerations resulting from mail 
returns exhibited a below-average rate. Enumerations 
obtained in conventional areas also had a below-average 
rate of gross overenumerations. 



The IRS-Census Match 

A methodological study conducted after the 1980 census, 
the IRS-Census Match, provides, as a by-product, informa- 
tion indicative of differential rates of gross omissions 
from the census. (The Internal Revenue Service provided 
a sample of tax returns to the Census Bureau for the 
analysis but had no access to the census data for these 
returns.) This study, which matched a sample of about 
11,000 filers of 1979 tax returns to 1980 census records, 



184 

found the following patterns of gross omission rates 
Appendix 5.1 for further details): 

(1) Categorizing tax filers by sex and ethnicity, 1 
men had a gross omission rate more than twice the avei 
for the study, while white women had a below-average i 

(2) Categorizing tax filers by marital status (pr< 
by joint versus single return) and income level, blacl 
filing single returns at all income levels and most 
Hispanics filing single returns had gross omission ral 
more than twice the average, as did blacks filing joii 
returns with low incomes (less than $8,000) and most 
Hispanics filing joint returns. In contrast, blacks 
filing joint returns and whites filing single returns 
with higher incomes ($15,000 or more) and most whites 
filing joint returns had below-average gross omission 
rates. 



Experience From Previous Censuses 

Coverage evaluation programs for previous censuses prc 
vide additional information about groups in the popul< 
that are more apt to be under counted compared with ott 
groups. It is important to look at data available fr< 
previous censuses both for clues as to the correlates 
the undercount and also to determine if there are any 
patterns over time. That is, are some population groi 
apparently getting easier to count and others harder 1 
count? Any time patterns that can be discerned have 
implications for choice of coverage improvement methoc 
in the next census. Unfortunately, only the post- 
enumeration survey program for the 1950 census provide 
separate gross overcount as well as undercount figures 
and the great differences in enumeration methods make 
hard to compare the 1950 with the 1980 results. 



Demographic Analysis 

Previous censuses show similar patterns, though hie 
levels, of net undercount for broad population groups 
in 1980, using the method of demographic analysis. Ir 
every census since detailed coverage analysis began ir 
1950, blacks were more poorly counted than others and 
more poorly counted than women (see Table 5.1) . 



185 



HI 


< 

z 

LLJ 
U 
DC 
LU 
CL 



20 



15 



10 



-5 



-10 - 



\ 




1980 
1960 



I I I 



I I I I I I 



I 



5 10 15 20 25 30 35 40 45 50 55 60 65 and over 
YEARS OF AGE 

FIGURE 5.2 Percentage net undercount rates by age for 
black men: 1960-1980 censuses (determined from 
demographic analysis) . 



Looking at patterns of undercount for more finely 
stratified age, race, and sex groups reveals some 
intriguing differences over time. Black men of working 
age were the most heavily undercounted group in 1980. 
This has also been true in previous censuses, but the 
data show a shift in the age groups most affected (see 
Figure 5.2) . In 1960, black men ages 15-39 were most 
heavily undercounted? in 1970, the age group experiencing 
the greatest undercount among black men had shifted to 
the range from 20 to 49? in 1980, black men with the 
greatest undercount rates were in the age range from 25 
to 54. 

This pattern does not clearly support a conclusion 
that undercount among black males is age-specific nor a 
conclusion that high rates of undercount are specific to 
a particular cohort of the population. Nevertheless, the 
data suggest that a group of black men who were ages 
15-34 in 1960 is still proving particularly hard to count 
as the cohort grows older. The data also suggest that in 
every census young black men age 20 and over are much 
harder to count than black male teenagers. The phenomenon 
of black children under age 10 of both sexes being rela- 
tively hard to count appears to be a new pattern evident 



186 

in 1970 and 1980 but not 1960 (based on data not show 
for black female children as well as the data shown f 
males) 



Post- enumeration Surveys 

Post-enumeration surveys conducted in previous cen 
provide data on relative rates of undercoverage for vi 
ous population groups* Appendix 5.1 reviews the find 
of these surveys in detail. Highlights of the survey 
results include: 

(1) With respect to household relationship, the 1 
and 1960 survey results corroborate the finding from 1 
1980 PEP that persons not belonging to the nuclear fa: 
are harder to count than household heads, spouses, an< 
their children. 

(2) Survey data from 1950 suggest that fewer yean 
schooling are associated with a higher-than-average gi 
omission rate. 

(3) Findings with regard to labor force status, o< 
pat ion, and income are mixed. The 1960 survey found 
relationships of low income and unemployment to highe: 
rates of gross omissions, but, in the case of income, 
relationship appeared stronger for whites compared wii 
blacks. Both the 1960 and 1950 survey results estimat 
high gross omission rates for persons employed as agr: 
cultural laborers, while farmers and farm managers hac 
below-average gross omission rates. 



Resident Observer Studies 

The techniques of resident observation employed in 
ethnographic studies were used on one occasion to invc 
tigate factors affecting the coverage of household sur 
veys. The findings from this study support and extenc 
the findings about hard-to-count groups in the census 
based on traditional methods of coverage evaluation. 
Appendix 5.1 provides a full description. 



Housing Coverage Studies 

Rates of omission of housing units do not necessarily 
translate into comparable rates of missed persons; 



187 

nevertheless, studies of completeness of coverage of 
housing units conducted in every census since 1950 are 
another source of information on relative rates of gross 
omissions in the population. Persons can be missed in 
the census because the entire household is overlooked or 
because one or more persons in an otherwise enumerated 
household are missed. In 1950, the post-enumeration 
survey indicated that three-quarters of all missed persons 
were in whole households that were missed, while only 
one-quarter were in enumerated households (Bureau of the 
Census, 1960sTable C) . By 1970, this distribution had 
changed: only half of missed persons were in missed 
households and the other half were in otherwise enumerated 
households. Among blacks, nearly three-quarters of those 
missed were in enumerated households (Siegel, 1975) . 
With improvements in compilation and review of the address 
list used for the census, the remaining problem of cover- 
age has, to a great extent, shifted from a problem of 
locating structures to one of finding everyone who is 
associated with a particular household. Appendix 5.1 
reviews findings from the 1980 census and previous 
censuses on characteristics of missed housing units. 



COVERAGE IMPROVEMENT PROGRAMS: PAST EXPERIENCE 

In past censuses, the Census Bureau has implemented 
programs designed to improve coverage. Those programs 
have included general advertising and publicity to 
increase awareness of the census and encourage response, 
programs directed toward improving the quality of staff 
and operational procedures, and, finally, special programs 
targeted specifically to known problem areas. This sec- 
tion reviews the special coverage improvement programs 
implemented in 1970 and 1980 to address specific problem 
areas. 



Coverage Improvement in 1970 

The Census Bureau adopted specific coverage improvement 
procedures for the 1970 census predicated on three assump- 
tions: 

(1) The need for even greater accuracy in the popula- 
tion count than achieved in the past because of the use 



188 

of the data for legislative redistricting under "one man 
one vote 98 court requirements and the growing use of the 
data for fund allocations. 

(2) The perception that it was becoming increasingly 
difficult to obtain a complete count in the absence of 
additional coverage efforts. 

(3) The belief that new methods would be required to 
effect any coverage improvement. As a history of the 
coverage improvement efforts in 1970 notes (Bureau of the 
Census, 1974b:l)s 

The 1950 and 1960 programs were predicated on the 
assumption that under counts were due largely to 
the enumerator's failure to follow instructions. 
Hence, stress was placed on simplified procedures, 
training, and quality control. Analysis of the 
results of the 1960 evaluation program . . . 
indicated that the reasons were more complex, in 
particular, a substantial part of the undercount 
appeared to be due either to deliberate attempts 
by some segments of the population to be omitted 
from the census or to the fact that they did not 
fit into any households by the conventional rules 
of residence. Even where the undercount was due 
to complete households being missed, the causes 
were frequently such that additional enumerator 
training, exhortation to the enumerators, and 
similar approaches appeared potentially capable of 
only marginal gains. 

Programs to encourage public cooperation with the 
census, particularly among hard-to-count groups, were 
important components of the Census Bureau's strategy to 
obtain complete coverage in 1970. These programs included 
public information efforts and community education pro- 
grams, assistance centers set up in 20 cities that the 
public could call or visit for help in filling out census 
forms, and providing instruction sheets and questionnaires 
in Spanish and Chinese where needed. Special efforts to 
improve enumerator performance in the 20 largest cities 
were also adopted. 

The Census Bureau also implemented specific coverage 
improvement programs designed to add housing units and 
persons to the count, most of which were also used in the 
1980 census. These programs are identified in Table 5.2, 
which indicates the number of housing units and persons 
added by each program, total costs, and costs per housing 



189 

unit and person added (all costs are in 1980 dollars) * 
The table categorizes the programs as: (1) programs 
carried out prior to Census Day with the primary purpose 
of correcting the address list, both in terms of entire 
structures and units within structures, (2) programs 
carried out during the data collection phase and designed 
to locate missed units within structures or to verify the 
occupancy status of listed units, and (3) programs 
carried out during data collection and designed to add 
missed persons. Note that the cost estimates provided 
are only approximate, as are the estimates of numbers of 
housing units and persons added to the count. 

In brief, the 1970 coverage improvement programs 
included: 

1.1 Advance Post Office Check (APOC) . The APOC 
involved a check of the address list carried out from 
February through October 1969 by the U.S. Postal Service 
in areas for which the Census Bureau purchased commercial 
mailing lists. These areas included about three-quarters 
of the mailout-mailback population or 45 percent of the 
total population. 

1>2 Precanvass. The Precanvass was an additional 
check that Census Bureau enumerators made several weeks 
before Census Day of the address list in selected 
enumeration districts of 17 large metropolitan areas 
expected to prove difficult to count. The enumerators 
concentrated on identifying multiple units within 
structures. 

1.3 Casing and Time of Delivery Checks. These checks 
involved review of the address lists by the Postal Service 
just prior to Census Day both in mailout areas for which 
the Census Bureau purchased lists and in prelist areas 
for which Census Bureau enumerators developed the mailing 
list. 

2.1 National Vacancy Check . In the National Vacancy 
Check, the Census Bureau carried out a sample survey of 
about 13,500 housing units originally classified as vacant 
to determine their occupancy status. On the basis of the 
findings, imputation procedures were used to reclassify 
8.5 percent of all vacant units as occupied and to impute 
persons to these units. 

2.2 Post-Enumeration Post Office Check. The PEPOC 
was administered in conventionally enumerated areas of 16 
Southern states. The Postal Service checked the address 
lists developed by enumerators for completeness and Census 
Bureau staff followed up a sample of missed addresses in 



190 



2 

i 







0) 

) 



U 



C 
0) 

U 



I 





S 

CQ 

rs 



o 

T3 

O 



U 



C 



(Si's 



s 8 

IE 



2 

I 



<N ,-H 

CN CO* 



<N 

c> 



SOO I 
O ' 

oo* ON { 



< 1 

H,' 00 



oo -H r^ > 
~ o oi 



a 



^ O r 
oo oo r 
Tf CO C 



CO N ^! 

doc 



80O O 

O 



S 
% U 

111 
&> So 



Q 
o 




191 



) OO OO O 

) _| Tj- <N 



S 
O O 

i> r^ 

ON O\ 



m oo co 

. CO 00 CM 

1 *O co O 



^ ,-, q 

CD <D 10 



U 




192 

the field. On the basis of this effort, housing un: :s 
and persons were added to the census records via im] it< 
tion. 

203 Report of Living Quarters Check. This checl 
involved comparing respondents 1 answers to Question i 
about number of living quarters at their address wil i 1 
number recorded on the census address list. For sti c- 
tures listed as having fewer than 10 units, for whi< i t 
respondent indicated a greater number of units than ot 
in the census list, enumerators made a field verifi< .tj 
of the number of units. 

3.1 Missed Persons Campaign. In this operation he 
Census Bureau left cards with community and other Ic ail 
organizations to distribute to persons in casual set ir 
such as carry-outs, barbershops, etc. The cards, wh ch 
asked for minimal demographic information, were to t 
returned to the Census Bureau to match to the census 
records. 

3.2 Movers Check. In the same metropolitan area i 
which the Precanvass was conducted, the Census Burea 
attempted to follow up persons reporting a change of 
address to the Postal Service during the census enum :a 
tion period. 

3.3 Supplemental Forms Operation. The Census Bu sa 
mounted special "Were you counted?" campaigns and 
enumerated persons who came forward on special forms 
Residents traveling overseas were also enumerated wi i 
supplemental forms. In most cases, these forms were 
processed for an area and persons added only when th< 
total number of supplemental forms represented 1 per ;n 
or more of the enumeration district population. 

The programs in category 1 added about 4.5 perceni t 
the housing unit and person count and were reasonably 

cost-effective (recognizing that cost-effectiveness < r 
coverage evaluation programs is difficult to measure, 

particularly in the absence of information regarding h 

proportions of housing units and persons correctly ad e< 

to the count, that is, not overcounted) . The APOC ac e 

1.7 percent to the overall housing unit count 3.8 pe c< 

in the commercial mailing list areas in which the prc r 

was conducted in addition to correcting many address s 

The Casing and Time of Delivery Checks added 2.6 perc n 
overall and fully 4.4 percent in the mailout areas in 

which these checks were performed. The Precanvass a<3 e< 

only 0.2 percent to the total housing unit count, but tl 
program was implemented in selected areas of only 17 



193 

metropolises* In these selected areas, the Precanvass 
added 2.3 percent to the housing unit count. 

The 1970 programs carried out during the data collec- 
tion phase and aimed at checking the count of housing 
units and their occupancy status (category 2) proved 
cost-effective as well, although these programs added a 
much smaller percentage to the population count than the 
address check programs. The National Vacancy Check added 
0.5 percent to the population count and reclassified 0.4 
percent of total housing units from vacant to occupied. 
The program cost very little per added person, because it 
was carried out on a small sample (about 0.2 percent) of 
units originally classified as vacant. Of course, in 
determining the cost-effectiveness of a coverage improve- 
ment program based on a sample survey, one must look not 
only at the cost per added person but at the reliability 
of the data obtained. A program with a smaller sampling 
fraction will cost less on a per person added basis com- 
pared with a more extensive program, but may also produce 
less reliable data. 

Evaluation of the 1970 National Vacancy Check indicated 
that data quality was high, even with the error introduced 
by sampling (Waksberg, 1970, 1971). The program was im- 
plemented in a conservative manner in several respects. 
First, units in the sample of 13,500 were reclassified 
from vacant to occupied only if the enumerator determined 
that the same family had continuously occupied the house 
during the census enumeration period. On this basis, 
11.4 percent of the sample units were reclassified. in 
the imputation procedure applied to the complete set of 
census records, instead of 11.4 percent, a total of 8.5 
percent of vacant units (to attempt to account for the 
smaller average household size of misclassif ied units in 
the sample compared with correctly classified units) were 
changed to occupied and persons imputed to these units. 
It turned out that this procedure imputed somewhat fewer 
persons than expected because the imputed household size 
for the reclassified units on average was yet smaller 
than the average household size targeted for the imputa- 
tion. The best estimate is that no more than another 0.1 
percent should have been added to the population count 
(Bureau of the Census, 1974b: 12-13) . 

The Post-Enumeration Post Office Check added 0.3 per- 
cent to the housing unit count overall and 0.2 percent to 
the population count. The program added 1.3 percent in 
the conventionally enumerated areas of the South in which 
it was carried out. The recheck of units in which the 



194 

respondent reported more living quarters than there we 
addresses for the structure on the mailing list added . 
a minimum about 0.2 percent to the population count ov 
all. The Census Bureau was only able to estimate the 
effects of this program for questionnaires returned by 
mail. For the latter universe, the added persons show 
in Table 5.2 represent about 0.3 percent of the total. 
The Census Bureau estimated that the Report of Living 
Quarters Check was erroneously omitted in one of three 
cases? if the check had been made for all applicable 
addresses, at least 0.3 percent would have been added 
the total population count (Bureau of the Census, 
1974b:4) . 

The programs directed toward finding missed persons 
(category 3) were least effective in terms of addition 
to the count. The Supplemental Forms Operation added 
less than 0.1 percent to the total population count, 
although, as previously noted, these forms were genera 
processed only where they represented 1 percent or mor 
of the initially enumerated population. The Movers Ch 
added a negligible number of persons overall and 0.6 
percent to the population of the areas in which it was 
performed (the same 17 large metropolitan areas in whi 
the Precanvass was implemented) . The Census Bureau 
estimated that the Movers Check would have added anoth 
0.6 percent to the population of these 17 areas if the 
program had been carried out completely according to 
specifications (Bureau of the Census, 1974b:8). 

As noted in a previous section, the 1970 census mis 
proportionately more persons in otherwise enumerated 
households than in missed households compared with the 
1950 experience. This result is probably due at least 
part to the relative effort and success achieved by th 
programs aimed toward housing unit coverage (the first 
and second categories in Table 5.2) versus the program 
aimed at identifying missed persons. Another point to 
emphasize regarding the 1970 coverage improvement 
strategy is that many programs were carried out on a 
selective basis in areas in which it was felt they wou 
be particularly effective or for which the effort was 
believed to be justified in terms of cost. Two progra 
the National Vacancy Check and the PEPOC, were carried 
out on a sample basis and the results used to impute 
persons to the census. Finally, there was some effort 
evident in the 1970 program, specifically in the Natio 
Vacancy Check, to guard against overcounting as well c 
undercount ing . 



195 
Coverage Improvement in 1980 

The 1980 coverage improvement strategy exhibited three 
differences from 1970: 

(1) The resources put into coverage improvement in 
1980 exceeded the resources spent in 1970 (expressed in 
1980 dollars) by several orders of magnitude, reflecting 
the belief that every effort was necessary to obtain 
accurate coverage to satisfy needs for fund allocation , 
red is trie ting, equal employment opportunity actions, and 
other important public policy uses of census data. Pro- 
grams aimed at increasing public cooperation, particularly 
among hard-to-count groups, such as special publicity 
efforts, assistance centers, and foreign-language ques- 
tionnaires, were greatly expanded, as were the number and 
extent of programs designed specifically to add housing 
units and persons to the count. 

(2) The Census Bureau made a deliberate decision to 
conduct most specific coverage improvement programs on a 
nationwide basis and to avoid the use of sampling and 
imputation. However, some programs were implemented 
selectively in areas specifically designated for the 
purpose. 

(3) Several new programs were adopted to tackle the 
problem of within-household undercoverage, although most 
programs, as in 1970, were directed toward improvement of 
the address list either before or after Census Day. 

Table 5.3 provides statistics from Census Bureau 
evaluations (see Thompson, 1984? updated in Bureau of the 
Census, 1985c) regarding coverage improvement efforts in 
1980 for programs implemented prior to Census Day directed 
at improving the address list and programs implemented 
during data collection. Again, estimates of cost and 
added housing units and persons are approximate. Overall, 
the 1980 census coverage improvement effort, including 
all the programs listed in Table 5.3, accounted for almost 
9 percent of the total costs of the census and added 
about 8.4 percent to the total population count. In 
1970, the coverage improvement programs listed in Table 
5.2 accounted for about 3 percent of the total census 
costs and added about 5.6 percent to the total population. 
As in 1970, the cost-effectiveness of specific 1980 
census coverage improvement programs varied greatly. 

The 1980 census address list improvement programs 
carried out prior to Census Day (Table 5.3, Panel A) 



196 







O O 


^0 \0 VO ^ o 








^ ^ ^ ^ 2 






s 


O> O i i I- Tf 







fi^a 


en vS ^ Tt W 




1 








o 








oo 








Is 


S 5e 


o o o o o 




o 








U 


HO 


*-< <N 


CO 




sL 




E 




ll 




2 




B ^ 


CO t CO CO 


I 


1 


fi^ 


CN oi c4 c*^ 1 


"S 


1 









V9 

1 


|| 


C4 CO 00 CO S 
~-^ O CNI Tf 


OH 





Z, o 


10 ^ ^ 


E 

HH 






1 




& 

ll 




> 


"O 









<L> 
13 


!*s 


CN oi CM r-" d 


CO 


<, 






CO 


.t 









D 






U 




U, 






OO 


G 
1 


l 


|| SS K 




O 






r 1 


ffi 


Z o 


oi f^T CN" vd^ 


O 






CO 






CO 






O 






U 




g 


1 




l-a i" 


CO 






C 

.2 




B * S /a? 1 g 3 

&! a> fc: *'CJ 


"-O 

5 




. -a eg o . . 

2 2 | o s 

O> 5? *_ {._ ! c/3 3 > 

S'i.2~-TL I5.il 


ro 

nJ 

PQ 

gs 


| 


&w?i>r^i^-^ ^b'So^ 

e ^ g g 1 M y 23^ 

^^^-^^-SsC ^ SfS *e3 
g ^^x^ 33CJ*-3 .S ^* y 



197 



en m TJ- o O xj- 
CN es in oo en o 


^ "S3 






T3 

1 


S 












O 










"3 
















H en r*- en 


II 






S 


c 
cd 
















II 






c 


en 














m oo -5- ON 


c fe 

Cd OH 






.2 


S 














vo oo vo ON 
























O in o oo 


o 






"S 


II 


ON 












oo '- r- 


* c 








^ 


- 














.sS o 









<3 


ON 





















Ofl 




00 














<l) 1> 








c 
















cx-c: 








cd 
















So *- 
<U o 






3 


is 


6^ 










O Q O O. O O 

m o CM t-- "tf- "-i 


" OJO 






o 


2 


II 










o 


4 in oo CN oo oo 


.S 






o 


"8 


x-v 




JS? 








^ ^ ^ ON 


" 






J 




Q 




O 










^ ^ 






^ 




J3 




"ti 










in G 






1 





cd 
O 

s 




n ^ 










oJ S 






: 


c 


'oo 




^"8 










"^ ) 






G 


c 


*0 




s ~' 3 
























G 








T 1 -l (N Tt 


Q Du 






o 


i| 


2 




.2 U 








O O O 00 


S -c 






cl) 


13 


T3 




1 1 










<^ 






2 


2 


r? 




o* 
















o -2 




5- 




52 jr* 










CO 










D- 














oo % 






cd gJ 


^D 


O 




ti E^ 






en O O ' ' Tf O 
i < ^ en t~- in en 


a> (U 
CJ S 






2 


cd 
| 


8 

00 

(2 




IJ 

GO ^ 








CN i -^r ^ 


^ "Z3 


*-* 




TO r~i 


*J3 


.2 




c ^ 








oC 


&j| 


S 




.a 


cd 






;> g 










c! 







Ho" 





1 




o ^ 










(L) t/3 


** 




bb"^" 


"S 


2 




"ti ^ 










>-i O 


-H 




o ^ 


W2 






^I 










OH ^ 








ta 

























O 














o o oo 


if 


o 

c 




lii 
ijj 


O 

Id 



| 

oa 


5 o" 

ON <N 

oo m 


1 
'S & 










-o 1 


i 




*"* J2 c 


'o 


1 


a"s 


"i C 




CN 






So c- 1 


c 




cd g .r: 


^ 




6^ &$ 


*o 


So 


o 


Panel C: Programs to improve person 
count during data collection 
Casual Count 
Coverage Questions and 
Dependent Roster Check' 93 
Nonhousehold Sources Program 
Were You Counted 17 
Subtotal 110 

TOTAL 7,162 


NOTE: Total 1980 housing unit and population counts, u 
conducted only in some areas of the country. Hence whei 


units, care should be taken to use the appropriate denom 


Also corrected 2.9 million addresses. 


& Also transferred 570,000 units from one geographic an 
c Also transferred 48, (XX) housing units and 56,000 pers< 
d Also reclassified 590,000 units from vacant to occupied, 


507,000 vacant units from the housing inventory. 
e Per housing unit and per person cost calculated as 55 


respectively. 
/ Per housing unit cost calculated as 41% of total costs (s 


00 

+ + 

(N i 

in ON 

OO" ON 
II II 

oo 

* g 
i | 

if 2 

do 


' Housing unit and person additions and costs based on c 
J Numerator = $7,770 (costs of Coverage Questions an 
* Numerator - $64,372 ($28,060 + $28,542 + $7,77 
/Numerator = $95,871 ($28,060 + $49,971 + $17,84 


SOURCE: Calculated from Bureau of the Census (1985i 



198 

proved extremely cost-effective. The Advance Post Off: 
Check performed by U.S. Postal Service staff in summer 
1979 and the Precanvass carried out by Census Bureau 
staff in early 1980 each added well over 2 percent to 
U.S. total housing unit count. Both of these programs 
were limited to the tape address register (TAR) areas 
(that is r city delivery areas for which the Census Bure 
had developed computerized geographic coding files and 
purchased commercial mailing lists) , and, in those are* 
they added between 4 and 5 percent each to the housing 
unit count for a cost of about 34 per added housing 
unit. Comparable figures for the 1970 programs are 3.1 
percent of housing units added by APOC in the TAR area* 
and 2.3 percent of units added by the Precanvass in 17 
metropolitan areas, for a cost of about $1 (in 1980 
dollars) per added unit. The 1980 Casing and Time of 
Delivery checks implemented by Postal Service staff ji 
prior to Census Day in the entire mail census area 
(including 95 percent of the population in TAR plus 
prelist areas) also added over 2 percent to the U.S. 
total housing unit count for about the same cost as the 
other two programs. In 1970, these checks added over 4 
percent of the housing units in the mailout areas. 

The programs carried out during data collection thai 
were primarily directed at checking the address list 01 
at determining whether units were correctly classified 
occupied or vacant (Table 5.3, Panel B) proved much mor 
expensive than the pre-Census Day programs. These 
programs included: 

* Local Review. The Census Bureau provided pre- 
liminary housing unit and also population counts 
to local officials after completion of the first 
stage of follow-up. Officials reviewed the cour 
and indicated problem areas for checking. 
Post-Enumeration Post Office Check. In contrast 
with 1970, PEPOC was carried out in all conven- 
tionally enumerated areas of the country on a 10 
percent basis as part of the second stage of 
follow-up. 

* Prelist Recanvass. In Prelist areas, the addres 
list was rechecked during the second stage of 
follow-up. In some areas, only selected 
enumeration districts were recanvassed. 
Vacant/Delete Check. In contrast with 1970, the 
1980 Vacant/Delete Check was implemented on a 1C 
percent basis during the second stage of follow- 



199 

Each of 8.4 million housing units originally 
classified as vacant or as "delete" because they 
were not residential were rechecked in the field. 

These four programs added about 1 percent to the total 
population count for an average cost of 323 per person 
added. Over 80 percent of this improvement was due to 
the Vacant/Delete Check. The Prelist Recanvass had the 
highest unit costs, and there is evidence that it experi- 
enced severe operational problems that diminished its 
effectiveness. The Local Review program also had high 
unit costs and added less than 0.1 percent to the 
population count. Local Review was very unevenly imple- 
mented across the country; many areas did not participate. 
The effectiveness of the PEPOC in terms of adding persons 
is understated in Table 5.3 because it was carried out in 
conventional areas representing only 5 percent of the 
total U.S. population. In these areas, PEPOC added 1.2 
percent to the population count, about the same as the 
performance in 1970, although the cost to add a person in 
1980 was almost two and a half times the 1970 cost, 
reflecting the difference between a 100 percent and a 
sample operation. The Vacant/Delete Check, as discussed 
further below, probably introduced a measure of over- 
counting as well as reducing the undercount. The 1980 
program added 0.8 percent to the population count compared 
with 0.5 percent for the 1970 effort. The cost to add a 
person from the 1980 Vacant/Delete Check was fully 100 
times the 1970 cost, reflecting the great increase in the 
number of units that were rechecked in the field. 

There are data on the characteristics of persons added 
to the census in 1980 from some of these programs. 
Evidence suggests that the Prelist Recanvass replicated 
the race distribution in the general population and hence 
did not help reduce differential undercount (Thompson, 
1984:12). This further lowers the panel's assessment of 
its relative cost-effectiveness. The Vacant/Delete Check, 
by contrast, made a measurable impact on differential 
coverage rates. Based on available data, it appears that 
this program may have reduced the black versus white 
differential undercoverage by 0.5 percentage points 
(estimated from Thompson, 1984:23). 

The programs carried out to improve the person count 
during data collection (see Table 5.3, Panel C) proved 
least cost-effective. These programs included: 



200 

9 Casual Count This operation was similar to the 
1970 Missed Persons Campaign, except that, instead of 
relying on community organizations , the Census Bureau 
sent special enumerators about 6 weeks after Census Day 
to places frequented by transients who might otherwise I \ 
missed. The operation was limited to centralized (city 
district offices. 

Coverage Questions and Dependent Roster Check. 
This program was directed both toward adding housing 
units and persons within households and also toward 
reducing erroneous inclusions. Responses to questions c 
number of units in the building and the roster of house- 
hold members were edited and followed up as appropriate. 

* Nonhousehold Sources Program. This operation an 
innovation in 1980 involved matching several administra 
tive lists to census records for selected census tracts 
in urban district offices. The lists used were driver's 
license records, immigration records, and public assis- 
tance records in New York City. About 6.8 million perso; 
were checked against census records. 

* Were You Counted. This program was similar to 
the 1970 Supplemental Forms Operation. 

The above four programs added only 0.2 percent to the 
total 1980 population count for a cost of over $39 per 
added person. The component of the Coverage Questions 
Check that involved rechecking buildings in which the 
respondent reported more living quarters than there were 
addresses on the mailing list appeared less effective in 
adding persons and more costly than the comparable Report 
of Living Quarters Check in 1970. (The Census Bureau was 
not able to evaluate the effectiveness or cost of the 
other coverage questions in 1980.) The Casual Count and 
Were You Counted programs had negligible impact in both 
the 1970 and 1980 censuses. The one major innovation for 
1980, the Nonhousehold Sources Program, which had appeared 
promising in pretests, added only 130,000 persons (less 
than 0.1 percent of the total population and less than 2 
percent of the total number of administrative list entries 
checked against census records), for a cost of over 375 
per person added. If the Nonhousehold Sources Program 
had been more effective in terms of persons added, the 
program could have had a pronounced effect on differential 
coverage rates. Among the small group of persons iden- 
tified through the Nonhousehold Sources list matching 
operation, about one-third each were white, black, and 
Hispanic, compared with the breakdown in the general 



201 

population of over 80 percent white, 12 percent black, 
and 6 percent Hispanic (Thompson, 1984;18-19). 

In addition to programs designed to add persons during 
the data collection stage, the 1980 census effort included 
a program called Whole Household Usual Home Elsewhere, 
which was designed to increase the accuracy of the count 
by area. In this effort, about 1 million persons were 
transferred from one enumeration district to another in 
accordance with the Census Bureau's rules of usual place 
of residence. For example, persons residing in a vacation 
home on Census Day had their data transferred to the loca- 
tion of their usual home. Other programs, such as Local 
Review and the Precanvass, also produced transfers as 
well as net additions. 



Evaluation of Coverage Improvement Experience in 1980 

Looking at the 1980 coverage improvement programs, it 
appears evident that programs carried out prior to Census 
Day to check the address list were important in improving 
the count and low in cost in terms of dollars per housing 
unit added to the list. Moreover, because these programs 
were implemented before the enumeration, any additions 
that were in fact duplications could be corrected sub- 
sequently. 

The costs per person added by the programs administered 
during data collection were quite high. The Nonhousehold 
Sources Program stands out in this regard, as does the 
Prelist Recanvass. The Vacant/Delete Check, although not 
the most costly on a per person added basis, was the most 
expensive program in total costs but it significantly 
reduced the differential undercount, which is of key 
importance. 

There is evidence that the Vacant/Delete Check con- 
tributed to overcount as well as importantly reducing 
undercount (see Bureau of the Census, 1985c:Ch.8). The 
1980 program (in contrast to the 1970 National Vacancy 
Check) was designed not only to verify the status of 
units originally classified as vacant or delete but also 
to identify and enumerate persons who were missed in the 
census because they were moving from an old to a new 
residence. Enumerators were instructed to ask residents 
of units originally classified as vacant whether they had 
moved in since Census Day, and, if so, whether they had 
been counted at their previous residence. Movers who 
stated that they had not been counted were enumerated at 



202 

the new address. However 9 people were often enumera 
without being aware of the fact (for example , becaus 
some other household member filled out the form) , an 
hence i movers located in the Vacant/Delete Check wer 
risk of being counted twice. 

Other 1980 census coverage improvement programs, 
as Whole Household Usual Home Elsewhere , also probab 
contributed to overcount. The fact that all the cov 
improvement programs were implemented clerically , wi 
use made of automation, undoubtedly served to increa 
cost and reduce effectiveness. This was particular! 
for programs that were carried out in the final stag 
follow-up, when there was great pressure on the dist 
offices to close out their operations. 

Overall, the three predata collection coverage im] 
ment programs, together with the Vacant/Delete Check 
accounted for over 95 percent of persons added but 01 
66 percent of the coverage improvement budget casti 
doubt on the cost-effectiveness of the other approacl 
These comparisons would be even more favorable to th 
specific programs if the Vacant/Delete Check had beei 
carried out on a sample basis, as in 1970. 



CENSUS BUREAU PLANS FOR TESTING COVERAGE IMPROVEMENT 
PROGRAMS FOR 1990 

The Census Bureau's testing program for the 1990 ceni 
began in spring 1984 with tests in several urban and 
rural localities of improved methods of address list 
compilation a key element in achieving completeness 
coverage (Bureau of the Census, 1984b) . Included in 
plans for 1986 pretests are many tests related to co\ 
improvement (see Johnson, 1984; updated in Bureau of 
Census, 1985b) . Almost all the programs implemented 
1980 are scheduled for further testing in 1986, alone 
with some new programs. Current plans call for testj 
improved techniques and procedures for the following 
programs that were used in 1980: 

Advance Post Off ice Check. As a high priority 
pretest objective, the Census Bureau proposes to tesl 
use of mailout-mailback procedures in rural areas the 
were conventionally enumerated in 1980. One procedui 
be tested would be to have Census Bureau staff prelis 
the area, followed by an APOC, with the Postal Servic 
delivering the questionnaires. There is a proposal i 



203 

urban areas to test enhancing the APOC by adding iden- 
tification of problem addresses (e.g., addresses where 
there is a mail drop for an entire building) . 

* Precanvass. The Census Bureau proposes testing 

an enhancement of the Precanvass that includes correcting 
addresses within all multiunit structures, even where the 
count in the structure from the Precanvass agrees with 
the count on the address register, and also to extend 
both the APOC and the Precanvass operations to prelist as 
well as tape address register areas. 

* Casing and Time of Delivery Checks and Local 
Review. Various improvements to these operations are 
proposed for testing. 

* Vacant/Delete Check. The Census Bureau proposes 
to test ways of improving the effectiveness of this 
program, not including, however, consideration of 
conducting the program on a sample basis. 

* Casual Count. Tests of automating the process of 
searching for persons identified in the Casual Count 
operation and of adding them to the census are proposed. 

* Coverage Questions and Dependent Roster Check. 
The Census Bureau proposes to examine the combination of 
questions used in 1980 to check within household coverage 
to determine if rewording, new instructions, or other 
changes will increase their effectiveness, and also to 
test adding questions about multiple residences that 
could help minimize overcounting. The Census Bureau also 
proposes to test improvements in the Whole Household 
Usual Home Elsewhere program. 

* Nonhousehold Sources Program. Various possible 
improvements are proposed for testing in this program, 
such as the use of new sources of administrative lists 
and the use of automated matching and searching 
techniques. 

The only programs used in 1980 that are not proposed 
for testing in 1986 are the Prelist Recanvass and the 
Post-Enumeration Post Office Check. (Conventional area 
enumeration methods, which include PEPOC, will not be 
tested in the 1986 round of pretests and may not be used 
at all in 1990.) A new program being considered for 
testing is the use of an update list/leave procedure in 
prelist areas, in which Census Bureau enumerators instead 
of the Postal Service would deliver questionnaires and at 
the same time update the address list. Update list/leave 
is also proposed for testing (although perhaps not until 



204 

1987) in multiunit structures in urban areas that pos< 
special problems for mail delivery. 

The Census Bureau has outlined an ambitious testin< 
program related to coverage improvement. The panel s 
its belief in Chapter 3 that the Census Bureau must cl 
among the ideas proposed for testing. In the followii 
sections, we offer recommendations regarding prior itie 
for testing and research in the area of improving the 
count. The discussion first addresses needed researcl 
hard-to-count groups and on problems of overcount. 



NEEDED RESEARCH ON UNDERCOUNT AND OVERCOUNT 

The panel supports further work at the Census Bureau t 
analyze the characteristics of population groups and 
areas more subject to census undercount and also those 
more likely to be overcounted. The panel also support 
further analysis, to the extent available data permit, 
the effectiveness of coverage improvement programs in 
reducing differential undercount. This research can 
contribute importantly to the planning of special cove 
improvement efforts for the next census and also to th 
planning of evaluation programs to determine the com- 
pleteness of coverage that was achieved. At the prese 
time, the undercount research staff at the Census Bure 
is continuing investigation of gross undercount and ov 
count with the data from the Post-Enumeration Program, 
including analyzing enumeration districts that contain 
nonmatched cases (that is, gross omission cases in the 
Current Population Survey) on characteristics such as 
percentage not speaking English and percentage low ino 

Recommenda t ion 5.1. We recommend that the Census 
Bureau assign a high priority to the completion of 
studies of undercount and overcount in the 1980 
census. 

Research on the characteristics of hard-to-count 
groups and of groups and areas prone to overcount will 
more useful for planning coverage improvement and eval 
tion programs for the next census to the extent that tl 
research is completed expeditiously. To be designing 
pretests for 1990 without having completed research on 
undercount and overcount diminishes the value of the 
research results and can result in less well-designed 
tests. 



205 

Recommendation 5.2. We recommend that the Census 
Bureau set up a timetable and assign staff to permit 
completion of the analysis of 1990 coverage evaluation 
results in time to be used in planning the first 
pretest of the 2000 census. 

ISSUES IN COVERAGE IMPROVEMENT: QUESTIONNAIRE CONTENT 

Next the panel discusses priorities for research and 
testing of coverage improvement programs, beginning with 
consideration of items on the questionnaire that relate 
to coverage. These items include the questions on race 
and Hispanic origin as well as questions designed 
specifically to help coverage, such as number of living 
quarters or addresses in the respondent's building. The 
population counts for race and Hispanic groups are 
affected by the accuracy of reporting race and ethnicity 
as well as by coverage errors, and it is important to 
understand what responses to these questions mean if 
appropriate estimates of coverage rates are to be 
developed (for example, from demographic analysis) . 

Race and Hispanic Origin Questions 

Information about race and ethnicity, including particu- 
larly Hispanic origin, is required for the implementation 
of a number of federal and state laws pertaining to pol- 
itical representation, civil rights, and assistance to 
disadvantaged groups. Even if it were not for these 
specific legal requirements, such information would be 
needed as a basis for understanding the political and 
economic status of various racial and ethnic groups. The 
legal uses of racial and ethnic categories reflect basic 
political and economic concerns of U.S. society today. 
These concerns are evident in the importance attached to 
completeness of coverage in the census for race and ethnic 
groups. Differential rates of net undercoverage for 
example, the net undercount rate of greater than 5 percent 
estimated for blacks in 1980 compared with a rate probably 
considerably less than 1.5 percent for all others have 
excited more attention than the undercount rate for the 
entire population. 

Information about race has been collected in each cen- 
sus since 1790. A specific separate question on Hispanic 
origin was introduced for the first time in 1970, when it 



206 

was asked on a sample basis. In 1980 , a question 01 
panic origin was included on the short form. 

For 1990, issues related to the panel's work inc! 

(1) Whether question design can be improved to 3 
more accurate and/or more useful information inclu< 
whether the design should explicitly strive for comi 
bility with other sources of race and ethnicity inf< 
tion, such as vital statistics. 

(2) Whether, for considerations of coverage impi 
ment, minimizing respondent burden, or other reasons 
part of the race and ethnicity information could moi 
appropriately be collected on a sample basis. 



Race and Ethnicity Questions in Earlier Censuses 

Changing information needs and societal attitude* 
race and ethnicity have been reflected in changes ir 
design, content, and enumerator instructions for the 
and ethnicity question (s) from one census to the nex 
The frequent changes severely limit data comparabil] 
across succeeding censuses. 

In 1920, persons of mixed white and Negro blood v 
classified as Mulatto. Anyone who was not classifie 
White, Black, Mulatto, Chinese, Japanese, or Indian 
classified as "Other." In 1930, the Mulatto designa 
was dropped. Enumerators were instructed to list p 
with any Negro blood, no matter how small the percen 
as Negro. Persons of Mexican birth or parentage wei 
be listed as "Mexican" unless definitely Negro, Indi 
Chinese, or Japanese. In 1940, Mexicans were listec 
white unless definitely Indian or some other race. 

There were apparently no further major definitior 
changes in 1950 or 1960. In 1960, racial designatic 
and, in 1970, ethnic designations, were placed on a 
self-identification basis, although, where data were 
collected by an enumerator, the enumerator was allov 
fill in blanks by observation when possible. In 198 
however, enumerators were no longer allowed to enter 
by observation. In every modern census, missing res 
have been filled in via editing and imputation routi 



207 

The Directive to Standardize Federal Race and 
Ethnicity Information 

Increased legal and program uses of racial and ethnic 
designations in the 1960s and 1970s produced a prolifera- 
tion of race and ethnic data collections by various 
agencies, using a variety of concepts and definitions. 
To improve data comparability, the Office of Management 
and Budget's (OMB) Statistical Policy Division in 1977 
established standard categories to be used by all federal 
agencies collecting data on race. The prescribed racial 
categories ares white, black, American Indian or Alaskan 
Native, and Asian or Pacific Islander. The ethnicity 
categories are: Hispanic origin, not of Hispanic origin* 
Alternatively, Statistical Policy Directive 15 allows 
agencies to use a combined race and ethnicity categoriza- 
tion: white (not Hispanic) , black (not Hispanic) , 
Hispanic, American Indian or Alaskan Native, Asian or 
Pacific Islander. 



The 1980 Census 

The race question on the 1980 census was designed with 
the aim of obtaining accurate information that could be 
aggregated into the OMB prescribed groupings with minimum 
need for hand tabulation. Since there was evidence that 
many respondents might be unaware that their racial back- 
ground was one that the federal government includes in 
the "Asian and Pacific Islander" group, nine separate 
race or ethnic groups for aggregation into this category 
were listed. Also listed were white, black, Indian 
(Amer.), Eskimo, Aleut, and "other," for a total of 15 
categories (see Figure 5.3 for question format). 

A question on Spanish/Hispanic origin appeared 
separately (and with two other questions intervening) on 
the 1980 census. It requested information for four 
separate Spanish/Hispanic categories (see Figure 5.3). 

Thus two of a total of seven population questions on 
the 1980 census short form were about race or ethnicity. 
Together these two questions took up about 30 percent of 
the space on the population part of the short form. 

Almost 6 million individuals identifying themselves as 
Hispanic on the Hispanic origin question (about 40 percent 
of total Hispanics) marked "other" on the race question. 
In contrast to 1970, when similar responses were clas- 
sified as "white" during tabulation, these responses were 



208 



Her are the 

QUESTIONS 



These are the columns 
for ANSWERS -^ 

/Vrse f/// o/? column for each 
person listed in Question 1. 



PERSON in column 1 



2. How is this person related to the person 
in column 1? 

Fill one circle. 

If "Other relative" of person in column 1, 
give exact relationship, such as mother-in-law, 
niece, grandson, etc. 



START in this column with the househo 
member (or one of the members) in whoi 
name the home Is owned or rented. If tht 
Is no such person, start In this column wi\ 
any adult household member. 



4. Is this person 





White 


O 


Asian Indian 




o 


Black or Negro 


O 


Hawaiian 


Fill one circle. 


o 


Japanese 





Guamanian 




o 


Chinese 





Samoan 







Filipino 


o 


Eskimo 




o 


Korean 


o 


Aleut 




o 


Vietnamese 


o 


Other Specify 




o 


Indian (Amer.) 










Print 










tribe -*- 

















7. Is this person of Spanish/ Hispanic 
origin or descent? 

Fill one circle. 



O No (not Spanish/Hispanic) 

O Yes, Mexican, Mexican-Amer., Chicane 

O Yes, Puerto Rican 

O Yes, Cuban 

O Yes, other Spanish/Hispanic 



FIGURE 5,3 Race and Hispanic origin questions on the 
1980 census short form. 



kept in the "other" category in 1980. This change re- 
flected a joint OMB-Census Bureau decision that the gre 
majority of the Hispanics who responded in this way und 
stood the race question and did not consider themselves 
white. Some data users are critical of this decision, 
which they argue impairs the comparability of the 1980 
data with the data from the 1940 through 1970 censuses. 
1980 data have been tabulated and published in such a w 
as to permit users to reclassify this group if they wis 
however. Even with such reclassif ication, data are not 
fully comparable from one census to the next, due to a 
variety of other changes in question design, enumerator 
instructions, and editing rules. 



209 
Considerations for 1990 

Collection of information on race and ethnicity in a 
large, diverse country such as the United States is 
inherently difficult. With the introduction of the 
concept of self-identification, the racial and ethnic 
categories moved away from their former precise, or 
pseudo-precise, anthropological definitions and toward 
definitions stemming from commonly perceived cultural 
categories. This shift was appropriate. Certainly, 
questions requiring information about percentages of 
Negro and Indian blood (used, at least in theory, through 
1950) would be generally regarded as offensive today. 

The quest for accurate self-identification by respon- 
dents and the feasibility of computer tabulation produced, 
in 1980, a "race" question that was in fact a mix of 
racial, ethnic, and geographic categories. This was not 
inappropriate, but it does raise the question of whether 
the questions on race and Hispanic origin could be 
combined. 

A related question is the possible need for informa- 
tion on additional ethnic or geographic categories in 
1990. Since 1980 there has been substantial entry into 
the United States of refugee populations from Cambodia, 
Haiti, El Salvador, and elsewhere. These groups remain 
tiny relative to the total size of the U.S. population, 
but it may be that, as groups of particular policy con- 
cern, detailed information on their geographic location 
will be sought. This situation suggests the desirabil- 
ity, in the interest of keeping the short- form question 
of manageable length, of moving some of the detail on 
Asian and Pacific Islander categories to the long-form 
sample. Arguing against this is the probable difficulty 
3f obtaining accurate short-form responses without listing 
all the detailed categories. 

There is no clear evidence that inclusion of detailed 
race and Hispanic origin questions on the short form in 
1980 was a barrier to a complete count. The group that 
Logically might find these questions most irritating and 
irrelevant is the white, non-Hispanic majority of the 
population. Undercoverage among this group is believed 
to have been minimal. Other population groups seemed, in 
general, willing to supply race and ethnic information 
and, in many cases, insistent on doing so. 

Design of the race and ethnicity question (s) is com- 
plicated by the limitations and ambiguities of common 
Snglish-language usage. It is difficult to find brief, 



210 







Yes, Spanish 






(includes Pu 






Cuban, Mex 






Chicano and 


Is this person: of Spanish/Hispanic Origin: 


Not Spanish/Hispanic 


Hispanic) 


White 








Black 








Japanese, Chinese, or Korean 








Vietnamese, Cambodian, Laotian, or Tai 








Filipino 








Asian Indian 








Hawaiian, Guamanian, orSamoan 








Eskimo or Aleut 








Indian (U S. Tribes: print tribe ) 








Indian (Mexican, South, or Central Amer.) 








Othpr (Specify \ 









FIGURE 5.4 Race and Hispanic origin information th; 
be required in 1990. 



unambiguous, readily understood phraseology for dis 
ishing Indians (from India) from Indians (native U.! 
tribes). The 1980 phraseology "Indian (Amer.)" is 
ambiguous. Does it apply only to tribes native to 
United States or does it encompass all Indians of N 
and South America? Presumably, the former was inte) 
but should those of Mexican, South, and Central Ame 
origin not also have the opportunity to convenientl: 
identify their origin and would this not be useful 
information? 

The matrix of information that may need to be co 
lected on the short form is illustrated in Figure 5 
but this illustration is not intended as format or 
phraseology for use in an actual census question. 

Recommendation 5.3. We recommend that the Censu 
Bureau test a variety of question designs for th< 
and ethnicity information to be collected in the 
census, including some that combine the collecti< 
information on Hispanic origin with the other ra 
ethnicity information. 



Developing Race and Ethnicity Questions 

The Census Bureau does not have many opportuniti 
test important questionnaire changes, such as chang< 
the race and Hispanic origin questions prior to a c 



211 

Moreover , it is expensive to mount full-scale question- 
naire wording tests f as was done prior to 1970 and 1980 
and is planned for 1990 in a national content test cur- 
rently scheduled for 1986. 

The focus group technique has been successfully em- 
ployed to design survey questions. This approach , 
originally developed in market research, involves in- 
depth discussions with small 9 usually homogeneous groups 
(Higginbotham and Cox, 1979; Slavson, 1979). Focus 
groups offer the advantage of being able to probe for 
underlying meanings and hidden associations evoked by 
different question wording that may affect responses in 
unforeseen ways. This feature may be particularly useful 
for the testing of questions on race and ethnicity. While 
focus group findings cannot be directly generalized, focus 
groups can help narrow the range of question alternatives 
that warrant testing with larger and more costly samples 
selected scientifically. 

As a case in point, prior to the 1980 census the Census 
Bureau conducted numerous tests of different wording of 
the question on Hispanic origin. The various pretests 
and dress rehearsals tried out variations of this ques- 
tion, as did the 1976 National Content Test, which had a 
sample size of 28,000 housing units. A number of serious 
response problems were encountered. For example, in 
almost every case in which a question had a category with 
the term "American," such as "Central or South American" 
or "Central or South Amer. (Spanish)," there was evidence 
that some non-Hispanic Americans checked these responses 
(Fernandez and McKenney, 1980) . Holding a number of focus 
group sessions at an early stage in the questionnaire 
content planning would probably have provided timely 
evidence, for a relatively low cost, of this behavior and 
other response problems. In a similar situation, the 
Social Security Administration successfully used focus 
group interviews to identify problems and ambiguities 
with the race and ethnicity items on a proposed revised 
application form and designed operational tests of using 
alternative versions based on the focus group findings 
(Scherr, 1980? Scherr and Nelson, 1980). 

Focus groups cannot and should not replace other 
methods of questionnaire development, including sample 
surveys with alternative questionnaires and controlled 
laboratory or classroom experiments. (The Census Bureau 
conducted a number of classroom experiments prior to the 
1980 census that provided useful findings regarding 
placement of instructions, the position of particular 



212 

items on the questionnaire, requiring respondents to mak 
machine-readable entries for date of birth, and the use 
of graphics? see Rothwell, 1983.) However, we believe 
that the use of focus groups for questionnaire develop- 
ment of sensitive and ambiguous items such as race and 
ethnicity would be very useful. We initially recommendec 
the focus group technique in our interim report (National 
Research Council, 1984) , and we note that the Census 
Bureau used focus groups in the 1985 pretest in Tampa to 
elicit reactions to the questionnaire format. 

Recommendation 5.4 We recommend that the Census 
Bureau, in addition to other methods that it has 
traditionally employed, use the technique of focus 
group discussions as one means to develop questions on 
particularly sensitive items such as race and 
ethnicity. 



Comparability Considerations 

Although changes in question wording and categories 
for the race and ethnicity items may be necessary to 
improve the information, it is vitally important to 
strive for historical comparability of race and ethnicity 
data from one census to the next to the extent possible. 
Historical comparability is important to permit reliable 
analysis of changes in the status of various groups. 
Cross-temporal comparability is also important for evalu- 
ation of completeness of coverage, for example, using 
demographic analysis or reverse record check 
methodologies . 

Recommendation 5 . 5 . We recommend that, in 1990, as it 
did in 1980, the Census Bureau collect, tabulate, and 
release data on race and ethnicity in such a way that 
the data can be reaggregated as necessary to obtain 
maximum feasible comparability with 1980 and 1970. 

Comparability of race and ethnicity data from the 
census with race and ethnicity information collected in 
vital statistics records is also important for at least 
two reasons. First, vital statistics on births and 
deaths are large components of the total population 
estimates by race that are compared with the census 
counts to estimate net undercoverage using the technique 
of demographic analysis. Second, vital rates, such as 



213 

birth , death, marriage, and divorce rates, which have 
vital statistics data in the numerator and census counts 
in the denominator, are important social indicators that 
are commonly analyzed by race. For both of these pur- 
poses, it is desirable that the data on race from both 
vital statistics and the census be as comparable as 
possible. 

There will probably always be differences between the 
concepts of race and ethnicity as collected in vital 
statistics and in the census, if only because the methods 
of data collection vary: self -enumeration in the census 
versus identification by others in vital statistics (par- 
ents or medical staff for newborns and relatives or 
medical staff for decedents) . Nevertheless, discrepancies 
due to differences in categories and editing rules could 
be minimized. 

Currently, definitions of race and ethnicity differ in 
vital statistics from those used in the decennial census 
in several important ways. First, vital statistics 
records include Mexicans, Cubans, and Puerto Ricans in 
the white race category. Second, not all states deter- 
mine Hispanic origin, although the 22 states that do are 
estimated to account for 90 percent of Hispanic births. 
Third, there are some differences in editing rules when 
race is mixed or unclear. For example, in vital statis- 
tics birth records, newborns of mixed parentage are 
assigned the race of the father, unless the father is 
white or the mother is Hawaiian, in which case the child 
is classified according to the mother's race (see National 
Center for Health Statistics, 1982a, 1982b) . In the 1980 
census, by contrast, persons of mixed parentage who could 
not specify a single category were coded according to the 
race of the mother (in 1970, the rule was to use the race 
of the father see Bureau of the Census, 1983c.) 

At present, the National Center for Health Statistics 
is reevaluating the standard certificates for vital 
events. Specifically, the center is requesting comments 
on whether the birth and death certificates should 
include a question on ethnic origin or descent separate 
from the race item and whether the question should ask 
simply for Hispanic origin or ask for origin in every 
case, such as Italian, English, Cuban, etc. 1 



1 Personal communication from John E. Patterson to Miron 
L. Straf, October 26, 1984. 



214 

Recommendation 5.6* We recommend that the Census 
Bureau, the National Center for Health Statistics, an 
other relevant federal agencies work closely together 
to design questions and response editing rules on rac 
and ethnicity that minimize conceptual differences 
between census and vital statistics records to the 
extent feasible. The Office of Management and Budget 
should act as necessary to facilitate such coordina- 
tion* 



Coverage Questions 

The 1970 and 1980 censuses included several questions on 
the short form designed to aid in achieving a complete 
and accurate count. In 1970, Question H-A asked , "How 
many living quarters, occupied and vacant, are at this 
address?" with categories provided from 1 to 10 or more. 
Answers to this question were checked against the address 
list for structures with under 10 units to identify misse< 
households. In 1980, the same question was asked as 
Question H-4 and edited as in 1970. In addition, the 
1980 questionnaire included as Question 1 on the first 
page a space to list the name of each person living there 
on Tuesday, April 1, 1980, or who was visiting and had no 
other home (see Figure 5.5). An edit was performed to 
check that the number of names listed in this household 
roster agreed with the number appearing on the inside of 
the questionnaire; field follow-up took place if there 
were more names on the roster than inside. Finally, the 
1980 questionnaire included 3 questions (H-l, H-2, H-3) 
that probed for persons whom the respondent either failed 
to list in Question 1 or improperly included (see Figure 
5.5). 

As discussed above, evaluation indicated that the H-4 
edit in 1980 was less successful in adding housing units 
and persons than the comparable edit in 1970. Neither 
effort added more than a fraction of 1 percent to the 
population count. Review of questionnaires in 1980 that 
failed the H-4 edit indicated that the census office 
staff had a difficult time in conducting the edit and 
also that some respondents may not have correctly 
interpreted the question (Thompson, 1984:15). 

The Census Bureau was unable to evaluate the effec- 
tiveness of the household roster (Question 1) edit, 
because of the absence of appropriate records, but 
looking at Figure 5.5 suggests that respondents may well 



215 



Question 1 



List in Question 1 

Family members living here, including babies still in the 
hospital. 

Relatives living here 

Lodgers or boarders living here 

Other persons living here 

College students who stay here while attending college, 
even if their parents live elsewhere 

Persons who usually live here but are temporarily away 
(including children in boarding school below the college 
level) 

Persons with a home elsewhere but who stay here most of 
the week while working 



1 . What is the name of each person who was living 
here on Tuesday. April 1. 1980. or who was 
staying or visiting here and had no other home? 



Do Not List in Question 1 

Any person away from here in the Armed Forces. 

Any college student who stays somewhere else while 
attending college. 

. Any person who usually stays somewhere else most of the 
week while working there. 

Any person away from here in an institution such as a 
home for the aged or mental hospital. 

Any person staying or visiting here who has a usual home 
elsewhere. 



Note 

If everyone here is staying only temporarily and has a 
usual home elsewhere, please mark this box Q, 

Then please: 

. answer the questions on pages 2 through 5 only, 
and 
.enter the address of your usual home on page 20. 



If you listed more than 
7 persons In Question J, 
please see note on page 20. 



NOW PLEASE ANSWER QUESTIONS H1-H12 
FOR YOUR HOUSEHOLD 



HI. Did you leave anyone out of Question 1 because you were not sure 
if the person should be listed for example, a new baby still In the 
hospital, a lodger who also has another home, or a person who stays here 
once in a while and has no other home? 

O Yes On page 20 give namefs) and reason left out. 
O NO 



H2. Did you list anyone in Question 1 who is away from home now 

for example, on a vacation or In a hospital? 

O Ves On page 20 give namefs) and reason person Is away. 
O No 



H3. Is anyone visiting here who is not already listed? 

O Yes On page 20 give name of each visitor for whom there is no one 
at the home address to report the person to a census taker. 
O No 



H4. How many living quarters, occupied and vacant, are at this 
address? 

O One 

O 2 apartments or living quarters 

O 3 apartments or living quarters 

O 4 apartments or living quarters 

O 5 apartments or living quarters 

O 6 apartments or living quarters 

O 7 apartments or living quarters 

O 8 apartments or living quarters 

O 9 apartments or living quarters 

O 10 or more apartments or living quarters 

O This is a mobile home or trailer 



FIGURE 5.5 Coverage questions in the 1980 census. 



216 

have had problems with the instructions indicating which 
persons to list in Question 1 and which to omit, simi- 
larly, the instructions do not seem at all clear for 
households that on Census Day were at a vacation 
residence but had a usual residence elsewhere. 

The panel believes it is important that the questions 
and instructions regarding composition of the household 
be clearly communicated to respondents and that responses 
to such questions be given special attention by the field 
offices. This extra care is needed to minimize the pos- 
sibilities for incorrect enumeration, whether it be under- 
count, overcount, or misallocation of persons and/or 
housing units among geographic areas. 

Americans have always been highly mobile one-sixth of 
the population changes residence every year, and some of 
those persons are in the process of moving at the time of 
the census (Hansen, 1984:Table A) . Movers complicate both 
completion of an accurate count and evaluation of the 
count. Households with second (vacation) homes also com- 
plicate accurate enumeration. The 1970 census found that 
about 5 percent of households had a second home (Bureau 
of the Census, 1982c:751), and the percentage is growing. 
Finally, recent trends in living arrangements, retirement, 
and the workplace have resulted in populations with two 
or more "usual" residences that present special problems 
for accurate census-taking. Some examples include: 

Retired persons who have two "usual" homes, one 
for the winter months in a warm climate and the 
other for the summer months in a cool climate; 

Children of divorced families in which the parents 
have joint custody and the children spend a 
substantial part of the year, month, or week with 
each parent; and 

Two-career couples with jobs and residences in two 
different locations. 

It is debatable how each of these kinds of persons 
should be counted* Leaving aside the matter of assignment 
to a specific household and geographic area, populations 
such as these appear more than usually at risk of under- 
count as well as overcount. 

The Census Bureau is proposing as part of its 1986 
pretest program to consider alternative coverage and 
household roster questions and to test adding questions 
about multiple residences that could help minimize mis- 
counting. The Census Bureau also plans to test improve- 



217 

ments in the program that attempts to assign households 
found at their second (vacation) home to their regular 
residence (the Whole Household Usual Home Elsewhere 
program) . 

A range of questionnaire design techniques, including 
focus groups , would appear useful to employ for these 
questions to determine wording and formats that are clear 
to the respondent and also easy for census office staff 
to process. Research on trends in mobility 9 second homes , 
and multiple residences should also assist in decennial 
census planning. Identifying geographic areas particu- 
larly affected by these phenomena might suggest special 
efforts targeted to particular populations in these areas. 
It is particularly important in this regard to assess 
future trends. If f as appears probable f a growing part 
of the population is likely to have two or more usual 
places of residence, own a second home, or to be moving 
between an old and a new residence during the census 
enumeration, then planning for a complete and accurate 
count should give high priority to dealing with these 
groups. 

Recommendation 5.7. We recommend that the Census 
Bureau give high priority in its planning for 1990 to 
research and testing of questions and enumeration 
procedures that address problems of accurately counting 
persons in the process of moving, households with 
second (vacation) homes, and persons with more than 
one usual place of residence. 



A Specific Suggestion for a Coverage Improvement Question 

In the 1977 pretest in Oakland, California, the Census 
Bureau tested the concept of "network" or "multiplicity" 
response rules for coverage evaluation (Sirken et al., 
1978) . Such rules include asking parents to provide names 
and addresses of children and vice versa. Published 
results suggested that the address information furnished 
was not of sufficient quality to warrant further inves- 
tigation of this method as part of a coverage evaluation 
program that included matching samples of persons to 
census records. 

However, the panel believes that the concept of 
generating lists of individuals in an area from the 
census operation itself to use as a procedure to improve 
coverage is worth investigating, at least for hard-to- 



218 

enumerate areas, and comparing with other procedures 
terms of costs and effectiveness. The procedure wot 
to ask respondents in the census for lists of specii 
types of relatives not living in the household. inJ 
tion needed for nonresident relatives to facilitate 
locating them and determining if they had been incli 
in the census would include address and also basic c 
graphic characteristics, such as age and sex. The ] 
believes that relatives may be at least as good a sc 
of up-to-date address information to use for a mate] 
operation for coverage improvement as other lists tl 
are commonly suggested, such as driver s s licenses o: 
welfare records* 

The Oakland results suggested that address infon 
supplied by parents was somewhat more accurate than 
information supplied by most other categories of re! 
tives. Moreover , parents would probably be the mos j 
reliable source of information on a critical match 
birth date. Hence, asking parents to provide basic 
demographic information and addresses for children i 
living in the household could improve coverage, parl 
larly of hard-to-count groups such as young adult b. 
and Hispanic males and young black and Hispanic chi] 
in central cities. 

Recommendation 5.8. We recommend, as one procedi 
consider for improving coverage of hard-to-count 
groups, that the Census Bureau pretest a questioi 
asking parents for names and addresses of childr< 
are not part of the household. This question sh< 
be included in the 1986 pretests. 

Specifically, we propose that a question similar 
the following be added to the census form: 

Does anyone living in this household have a son < 
daughter living somewhere else? Yes No _ 
If yes, please list sons and daughters below. 



Name 



(Last First 

Sex Age Birth Date 



(Month - Day - : 

Address 



219 

The object is to improve coverage in hard-to-count 
areas, particularly of young children, and hence it would 
not be cost-effective or even feasible to follow up all 
children reported as not living in the household, in- 
stead, the goal would be to examine census returns from 
areas identified as hard to enumerate and to follow up 
those children reported by their parents as living in the 
same area* The question suggested above is phrased to 
ask parents for the addresses of all children not living 
in the household, so that there is no opportunity for 
misinterpretation of which children should be listed, but 
the follow-up could be restricted to children in target 
age-race-sex groups. 

The answers to this question would provide a list of 
individuals that can be matched against the census * 
Presumably the list could be constructed and follow-ups 
(perhaps on a sample basis) of nonmatches done during the 
census operation. Operational questions for a test 
include the accuracy of birth date and address obtained 
from parents, the method of identifying addresses that 
are from hard-to-enumerate areas and should be followed 
up, the method of locating addresses, the use of differ- 
ent procedures in urban and rural areas, and the method 
of sharing information in cities with multiple offices. 
The effects on response rates of asking this question 
also need to be examined. 

The panel recognizes that there are problems in adding 
a question to the census short form, particularly a ques- 
tion that requires a lot of space and that may be viewed 
as invasive of privacy. Indeed, given its intended 
follow-up on a sample basis, the question should perhaps 
be included on the long form only. 

Research and testing of the suggested multiplicity 
coverage question and of other such questions, including 
ones on multiple residences, should be closely coor- 
dinated. The wording and format of all such questions 
must be carefully considered to ensure that the entire 
package is communicated clearly. If there is concern 
over the increased respondent burden, particularly for 
recipients of the long form, the Census Bureau should 
consider deleting other questions. Chapter 6 suggests 
some long- form housing questions that could be deleted 
and collected instead from other sources. The Census 
Bureau should also consider the possibility of different 
questionnaire formats in different areas. For example, 
it might be possible to include the multiplicity coverage 
question only on questionnaires for enumeration districts 



220 

with certain expected characteristics, such as a high 
poverty rate and, conversely, to include questions on 
multiple residences only on questionnaires administer 
in other kinds of areas. 

Because the multiplicity question appears promisin 
for coverage improvement and also relates to other co 
age questions that the Census Bureau is proposing to 
in 1986, it is important that the multiplicity questi 
be tested in 1986 as well. The panel, in fact, recom 
mended in its interim report (National Research Counc 
1984) that the multiplicity question be tested in the 
first 1990 census pretest, that is, in 1985. The Gem 
Bureau has proposed delaying a test until 1987. For 
reasons outlined above, the panel believes that high 
priority should be given to testing coverage question 
1986 and that these tests should include a multiplier 
question. 



ISSUES IN COVERAGE IMPROVEMENT: SPECIAL ENUMERATION 
PROCEDURES 

The panel does not propose to comment in detail on ea 
of the various coverage improvement procedures used i; 
1970 and 1980 and proposed for testing in 1986. We 
believe we can be most useful to the Census Bureau by 
recommending general strategies for deciding the prio 
ities to assign in its 1990 census research and testii 
program. The Census Bureau staff exercised some sele 
tion in the process of drawing up the proposed package 
1986 pretests of coverage improvement programs, but t 
package still seems much too ambitious for the likely 
available staff and budget resources and for the time 
available to design, execute, and evaluate the result; 
from this and subsequent pretests prior to 1990. Pre 
results that cannot be assimilated in time to affect I 
next pretest or the dress rehearsals represent largel 
wasted effort. 

The panel believes that the goal of a complete enui 
at ion is very important and, as discussed more fully : 
Chapter 7, that adjustment procedures should not be v 
as an alternative to obtaining as complete a count as 
possible throuth cost-effective means. The panel alsi 
believes that special coverage improvement programs a 
make important contributions to improving the count. 
However, the panel does not subscribe to the view thai 
every coverage improvement idea that is suggested shoi 



221 

be pursued or that programs used in past censuses should 
automatically be included in the plans for the next 
census. The panel believes that evaluation results for 
coverage improvement procedures used in prior censuses 
should be carefully reviewed and that further research 
and testing should be conducted only for programs that 
meet certain criteria of cost-effectiveness, particularly 
in reducing differential undercounts. Similarly, proposed 
ideas for new kinds of procedures should be assessed 
against several criteria to determine the extent to which 
they appear promising and feasible. 

With regard to assigning priorities for research and 
testing of coverage improvement programs with which the 
Census Bureau has prior experience, the panel recommends 
the following strategy: 

Recommendation 5.9. We recommend that the Census 
Bureau review coverage improvement programs used in 
past censuses and proceed with research and testing 
directed toward use in 1990 of those programs that: 
(1) exhibited a high yield in terms of numbers of 
missed persons correctly added to the count and/or 
contributed significantly to reducing differential 
undercoverage r (2) exhibited low- to-moderate costs per 
person correctly added, and (3) did not add many 
persons incorrectly. Programs that do not satisfy 
these criteria should be dropped from consideration 
unless: (1) the program exhibited low total dollar 
costs and had demonstrable public relations or goodwill 
value in previous censuses or (2) there is some par- 
ticular reason to believe a revised program will yield 
greatly improved results. 

The above recommendation does not quantify terms such 
as "high yield" or "low cost." Obviously, the previous 
performance of specific coverage improvement programs 
should be carefully and appropriately measured and a 
decision whether to include a program in the 1990 pretest 
plans carefully arrived at. For example, the proportion 
of housing unit or person additions to the count should 
be measured using the appropriate denominator. In the 
case of a program administered in specified areas, the 
denominator should be the total count only for those 
areas. Admittedly, the available evaluation data are 
subject to margins of error that may be wide for some 
programs. Nonetheless, it seems possible to assign 
priorities through a hard look at the information in hand. 



222 

Based on the data in Tables 5.2 and 5.3, it appears 
that the various address checking programs carried out i 
advance of Census Day easily qualify for further con- 
sideration both in terms of proportion of additions to 
the count and in terms of cost per addition. Some other 
programs , such as the Were You Counted program, yielded 
very little but were quite inexpensive and appear to hav 
goodwill value in conducting the census. Still other 
programs are more problematical. Although the coverage 
questions and roster checks were low yield and costly to 
administer in 1980, it appears essential, as discussed 
above, that research be carried out to develop optimal 
formats and processing procedures for these questions to 
minimize problems of under count, over count, and misallo- 
cation among geographic areas. The Vacant/Delete Check 
met a minimum standard of 0.5 percent additions to the 
count, but it was costly in 1980. A high priority for 
further research on this program would involve investiga- 
tion of ways to reduce costs, for example, by returning 
to the use of sampling, as in 1970 (see further discussic 
in Chapter 6, where the panel recommends that the Census 
Bureau conduct research on the use of sampling for the 
Vacant/Delete Check and possibly other coverage improve- 
ment programs) . The Nonhousehold Sources Program appears 
to fail on tests both of additions to the count (even 
when the denominator is the number of addresses that were 
selected for matching) and cost. It is possible that the 
use of automation could improve the cost-effectiveness of 
this program as might selection of other kinds of lists, 
but, given that choices must be made, it appears that the 
program should be given low priority. 

For ideas with which the Census Bureau has little or 
no experience, the panel suggests that questions such as 
the following be asked: 

(1) To what extent is the proposed coverage improve- 
ment program directed toward known problem areas? For 
example, the Census Bureau is considering tests of a 
number of means of handling multiunit structures for 
which mail delivery is often problematical, including an 
update list/leave procedure that was tried experimentally 
in a few district offices in 1980. As another example, 
the proposed multiplicity coverage question is directed 
toward hard-to-count groups, specifically young minority 
children. 

(2) To what extent does any available evidence suggest 
that the proposed procedure might prove effective? For 



223 

example g although the update list/leave procedure in 1980 
resulted in significantly higher initial mail return rates 
in the experimental offices compared with the controls (81 
versus 71 percent see Mikkelson and McKelvey, 1983) , no 
significant differences in coverage have been determined 
(see Bailey and Ferrari, 1984? Mikkelson, 1984). 

(3) Do rough paper -and-pencil estimates of cost and 
yield suggest that the proposed program is likely to be 
ijost-effective? 

(4) Can the program be implemented in targeted areas 
as a means of improving cost-effectiveness, or can cost 
savings be effected through judicious use of sampling? 

For coverage improvement procedures that the Census 
Bureau decides to retain in its research and testing 
program, the panel believes it is important to further 
categorize them into programs that need early field 
testing versus those that can be researched with other , 
less expensive, and less staff- intensive methods. For 
example, it may not be necessary to include the Casual 
2ount program in any early full-scale pretest. Other 
procedures, such as trying out various address checks in 
prelist and conventional areas, probably need early 
:esting, particularly to work on integrating these opera- 
tions with the various automation efforts that are being 
jiven high testing priority. A strategy that does not 
attempt costly field tests of every program should help 
fiake the Census Bureau's budget and staff resources 
stretch farther and help reduce the problem of a pro- 
liferation of tests producing results that cannot be 
assimilated in time for 1990. Finally, one cost- 
iffective means of gaining useful information for 
Improving coverage programs would be to conduct focus 
rroups that include members of hard-to-count populations. 
Such groups could consider reasons for failure to be 
counted and consider as well the likely impact of 
particular programs. 

Recommendation 5 . 10 . We recommend that the Census 
Bureau conduct full-scale pretests in 1986 only of 
those coverage improvement programs that require such 
testing. Furthermore, we recommend that the Census 
Bureau use focus groups that include members of 
hard-to-count populations as one means to explore 
coverage improvement techniques and to narrow the 
range of options to be field- tested. 



APPENDIX 5.1 
GROSS OMISSIONS AND GROSS OVERENUMERATIONS IN THE CE* 

This appendix presents results of coverage evaluation 
programs that identify groups in the population that 
appear to have been less well counted through omissi 
and/or erroneous enumeration in the 1980 and previou 
decennial censuses. Most of the results represent 
estimates of gross omissions from the census. Result 
the demographic analysis method of coverage evaluatic 
are discussed in the text but not in this appendix, 
because demographic analysis provides estimates of ne 
undercoverage but not of the gross omission or gross 
overenumeration components. Results of studies of th 
completeness of census coverage of housing units are 
briefly reviewed. 

GROSS OMISSIONS OF PEOPLE 

Findings From 1980 Census Coverage Evaluation Prograir 
The Post-Enumeration Program 

The PEP program developed estimates of gross omiss 
in the 1980 census through matching interview records 
from the April and August Current Population Surveys 
P sample) to census records in the same small geograp 
areas (enumeration districts) . The PEP design result 
in gross omission rates that represent overestimates 
because, among other reasons, the rates include per so 
who were enumerated in the census but at a location s 
far removed from their address in the CPS that it was 
outside the area of search for a match in the census 
records. The PEP program also encountered many probl 
in implementation. To date, there are 12 separate se 
of estimates of undercount developed from PEP based o 
different treatment of problems such as nonresponse t 
the CPS (see Chapter 4 for a description and evaluati 
of PEP). Hence, the discussion that follows seeks tc 
determine the order of magnitude of differences in gr 
omission rates among population subgroups, but it car 
provide precise estimates. The tables shown in this 
section express findings from the PEP in terms of rat 
of the gross omission rate for a population group to 
average rate experienced for the total population. I 

224 



225 

lation groups are placed into one of five categories of 
relative gross omission rates: 

(1) Very Highs greater than or equal to 3 times the 
average rate? 

(2) High: greater than or equal to 2 times and less 
than 3 times the average rate; 

(3) Moderately High: greater than or equal to 1.25 
times and less than 2 times the average rate? 

(4) Average: greater than .75 times and less than 
1.25 times the average rate; 

(5) Below Average: less than or equal to .75 times 
the average rate. 

Several tables show relative gross omission rates from 
the PEP for population groups categorized by household 
relationship/ race and Hispanic origin (ethnicity) / type 
of place, and by ethnicity and type of place crossed with 
rates of nonreturn of the mail questionnaires in the 
district offices. The data represent the results of 
preliminary exploratory analysis conducted by the Census 
Bureau of the PEP 3-8 series. This series was based on 
matching the April CPS to the census and estimated an 
average gross omission rate for the total population in 
1980 of 5.4 percent. 

The 3-8 series happened to be the first series to be 
put into a computerized form suitable for this kind of 
analysis at the Census Bureau. Further work is necessary 
to confirm that the 3-8 findings are reliable and repre- 
sentative of the results shown by other series of esti- 
mates. More limited tabulations of several other PEP 
series of estimates were recently prepared by the Census 
Bureau/ and they generally confirm the picture shown by 
the 3-8 series regarding the population groups that were 
relatively harder to count in 1980 (see discussion at the 
end of this section) . 

Findings from the PEP 3-8 series. The preliminary 
findings from the 3-8 series with regard to which popula- 
tion groups proved relatively harder to count are not 
surprising/ but the dispersion among the five categories 
of relative gross omission rates is not always as great as 
one might expect. On the dimension of household relation- 
ship/ members of the nuclear family head/ spouse/ and 
son or daughter were relatively easy to find compared 
with other household members (see Table 5.4) . persons 
not related to the household head and relatives other 



226 

TABLE 5.4 Household Relationship by Relative Gross Omission Rate for a 
Sample of Persons, Post-Enumeration Program-Census Match (1980, PEP 
Series 3-8) 

Relative Gross Omission Rate Household Relationship 

Very high 

High Nonrelative 

Other relative 
Brother or sister 

Moderately high Mother or father 

Average Head 

Son or daughter 
Below average Spouse 

NOTE: Average gross omission rate for the 1980 PEP was 5.4%. 

a Categories of relative gross omission rates are as follows: 

(1) Very high: greater than or equal to 3 times the average rate. 

(2) High: greater than or equal to 2 times and less than 3 times the average rate. 

(3) Moderately high: greater than or equal to 1.25 and less than 2 times the average rate. 

(4) Average: greater than .75 and less than 1.25 times the average rate. 

(5) Below average: less than or equal to .75 times the average rate. 

SOURCE: Hogan (1983b:3). 

than parents had high rates of gross omissions compared 
with the average for the total population. On the dimen- 
sion of ethnicity (see Table 5.5), blacks were among the 
hardest- to- find groups , with a high relative gross omis- 
sion rate. Persons classified as Hispanic had a moder- 
ately high rate overall but showed dispersion when furthei 
categorized by place of origin. Puerto Ricans and "other 1 
Hispanics had high relative gross omission rates; the 
rate for Mexicans was moderately high; while the rate for 
Cubans was within the average category. Finally , America! 
Indians and Asian- Americans had moderately high gross 
omission rates , while the rates for non-Hispanic whites 
and other races were average. 

Distributions are not given for other demographic vari- 
ables such as age and sex. Males and females both had 
average rates of gross omissions, as did most age groups. 
(The PEP findings of very little difference in coverage 
rates between men and women contrast with the results of 
demographic analysis, which showed worse coverage for 
men, particularly among blacks.) From unpublished PEP 
3-8 series tabulations, young adults ages 15-24 had a 
moderately high relative rate of gross omissions, while 
persons age 45 and older had below-average rates. The 
PEP does not provide separate estimates of coverage of 
undocumented aliens, although the PEP sample probably 
included some representation of this group. 



227 

TABLE 5.5 Ethnicity and Mail Nonreturn Rate By Relative Gross Omission 
Rate for a Sample of Persons, Post-Enumeration Program-Census Match (1980, 
PEP Series 3-8) 

Relative Gross Ethnicity Mail Nonreturn Rate (mail 

Omission Rate Category* (detailed categorization) areas only)* and Ethnicity 



Very high 





30% or higher: 


Black 








Hispanic 


High 


Black (non-Hispanic) 


30% or higher: 


Total 




Hispanic: Puerto Rican 


15-29%: 


Black 




Other 






Moderately high 


Hispanic: Total 


30% or higher: 


White 




Mexican 


15-29%: 


Hispanic 




American Indian 


Less than 15%: 


Black 




Asian 






Average 


Hispanic: Cuban 


15-29%: 


Total 




White (non-Hispanic) 




White 




Other race (non-Hispanic) 


Less than 15%: 


Hispanic 


Below average 





Less than 15%: 


Total 








White 



NOTE: The average gross omission rate for the 1980 PEP was 5.4%. 

a Categories of relative gross omission rates are as follows: 

(1) Very high: greater than or equal to 3 times the average rate. 

(2) High: greater than or equal to 2 times and less than 3 times the average rate. 

(3) Moderately high: greater than or equal to 1.25 and less than 2 times the average rate. 

(4) Average: greater than .75 and less than 1.25 times the average rate. 

(5) Below average: less than or equal to .75 times .he average rate. 

fr The mail nonreturn rate is the percentage of occ.ipied households that did not mail their 
questionnaires to census offices. 

c The three ethnicity categories shown are exhaustive: black non-Hispanic, Hispanic of all races, 
and white and other race non- Hispanic. 

SOURCE: Hogan (1983b:2; 1983a:attached graphs). 



Among the variables displayed, the dimension of type 
of place (see Table 5.6) shows the least dispersion. 
Central cities of large standard metropolitan statistical 
areas (SMSAs with 3 million or more population) had a 
moderately high relative gross omission rate, while all 
other place types had average rates. Note that areas 
enumerated using conventional techniques rather than a 
mailout-mailback approach had a below-average relative 
gross omission rate. 

Distributions are not given for urban versus rural 
areas or region of the country (Northeast, Midwest, South, 
West) , as all of these categories had average gross omis- 
sion rates. This is not to say, however, that more 
in-depth analysis would not reveal interactions between 
region or urban versus rural and other variables. 



228 

TABLE 5.6 Type of Place and Mail Nonreturn Rate by Relative G )ss C 
sion Rate for a Sample of Persons, Post-Enumeration Program-Cei us N 
(1980, PEP Series 3-8) 



Relative Gross 
Omission Rate 
Category* Type of Place 



Mail Nonreturn Rate (mail area 
and Type of Place 



only)- 



Very high 





35% or higher: 


Central city, rge S 








Central city, aall S 


High 





30% or higher: 


Other, SMS^ 








Outside SMS 






25-34%: 


Central city, rge S 


Moderately high 


Central city, large SMSA^ 


25-34%: 


Central city, lall S 






10-24%: 


Central city, rge S 


Average 


Central city, small SMSA C 


15-29%: 


Other, SMS/ 




Other, SMSA 




Outside SMS 




Outside SMSA 


5-24%: 


Central city, : iall S 




All mailout-mailback areas 






Below average 


Conventional areas 


0-14%: 


Other, SMSA 








Outside SMS 






0-9%: 


Central city, ; ge S 






0-4%: 


Central city, s all 5 



NOTE: The average gross omission rate for the 1980 PEP was 5.4%. 

a Categories of relative gross omission rates are as follows: 

(1) Very high: greater than or equal to 3 times the average rate. 

(2) High: greater than or equal to 2 times and less than 3 times the average rate. 

(3) Moderately high: greater than or equal to 1.25 and less than 2 times the avera 

(4) Average: greater than .75 and less than 1.25 times the average rate. 

(5) Below average: less than or equal to .75 times the average rate. 

b Large standard metropolitan statistical area (SMSA) is defined as an area with ov< 

population. 

c Small SMSA is defined as an area with 3 million or less population. 

J The mail nonreturn rate is the percentage of occupied households that did not 

questionnaires to census offices. 

SOURCE: Hogan(1983b:4; 1983c; and unpublished worksheets). 



rat 



3n 



When either ethnicity or type of place is crosse 
the mail nonreturn rate for the district office (th 
100 percent minus the percentage rate at which ques< 
naires were mailed back from households) , the dispe. 
in relative rates of gross omissions increases dram< 
cally. While blacks on average had a high relative 
omission rate, those blacks in district offices wit* 
nonreturn rates of 30 percent or more had a very hi< 
relative gross omission rate (3 or more times the a\ 
rate) 9 and, conversely those blacks in district off] 
with mail nonreturn rates of less than 15 percent ha 
only a moderately high relative gross omission rate, 
similar spread is evident for Hispanics and for non- 



w 



s 
.i- 



m< 



r< 

ei 



229 

Hispanic whites (see Table 5.5) . The dispersion for type 
of place categories is even more extreme when the factor 
of mail nonreturn rates is introduced. Central cities of 
large SMSAs, which on average exhibited a moderately high 
relative rate of omissions , had a very high rate in those 
areas in which the district office mail nonreturn rate 
was 35 percent or greater and, conversely, a below-average 
rate in those areas of the central city in which the mail 
nonreturn rate was under 10 percent (see Table 5.6) . 

Findings from Other PEP Series. Unpublished tabula- 
tions of gross omission rates from two other PEP series, 
the 5-8 series based on matching August GPS records to 
the census, and the 14-20 series based like the 3-8 
series on matching April CPS records to the census but 
with a different treatment of incomplete cases, generally 
support the findings reported above fron the 3-8 series. 
The 5-8 series estimated an overall rate of gross omis- 
sions of 5.25 percent and the 14-20 series a rate of 3.45 
percent compared with the 5.4 percent rate estimated by 
the 3-8 series. In relative terms, all three series 
found that blacks had a high relative gross omission 
rate, Hispanics a moderately high rate, men and women 
average rates, young adults ages 15-24 a moderately high 
rate, persons age 45 and older below-average rates, and 
other age groups average rates. In each case, the 
determination of the gross omission rate category for a 
population group was made relative to the average rate 
for the particular series. Data from the match of August 
CPS records to the census also generally support the 3-8 
series of findings regarding the relationship of high 
mail nonreturn rates to high relative rates of omissions 
(Hogan, 1983a) . 



The IRS-Census Match 

A methodological study conducted after the 1980 
census, the IRS-Census Match, although not designed as a 
coverage evaluation study, provides some evidence on 
differential rates of gross omissions from the census. 
The purpose of the IRS-Census Match was to examine 
tracing and matching problems with pre-enumeration 
surveys and reverse record checks. The study attempted 
to match a sample of about 11,000 filers of 1979 tax 
returns to 1980 census records. Black and Hispanic 
filers were oversampled (Childers and Hogan, 1984a) . 



230 

The average gross nonmatch rate for the total s 
was 12.6 percent. There are many reasons for the 
rate, including the facts that addresses supplied 
taxpayers on IRS forms were not always the same as 
residence address and that the matching study was 
out several years after the census and not intend* 
produce coverage estimates. Nonetheless , some ins 
can perhaps be gained when the IRS sample is catec 
along several dimensions and gross omission rates 
subgroups are compared with the average for the ei 
sample. 

In the IRS-Census Match study f blacks and Hispc 
exhibited moderately high relative gross omission 
(category 3) , while the rate for non-Hispanic whil 
into the average category (category 4) Adding tt 
sion of sex increases the dispersion, with black i 
having a high relative gross omission rate (categc 
and white females a below-average rate (category ! 
These findings are consistent with those from the 
demographic analysis. 

The IRS-Census Match provides data on gross omj 
rates for two important dimensions i marital statu 
(proxied by type of return single or joint) and j 
(adjusted gross income reported to the IRS) . Thes 
dimensions help identify hard-to-count groups (sec 
5.7) . Persons filing single returns had moderatel 
gross omission rates, while persons filing joint i 
had below-average rates. Cross- tabulating type of 
with ethnicity gives the result that, while black 
return filers had a high relative gross omission r 
black joint return filers fell into the average cs 
Similarly, white single return filers had a modera 
high relative gross omission rate, while white joi 
return filers were below average. Type of return 
discriminate to any important extent among the His 
group. 

Adding the dimension of income refines the pict 
hard-to-count groups. Black single return filers 
high relative gross omission rates regardless of i 
level; however, income discriminated among black j 
return filers, with those reporting less than $8,C 
income showing a high relative rate of gross omiss 
but those reporting $15,000 or more income a belov 
average rate. Among whites, those filing single r 
with reported income under $15,000 and those filin 
returns with income under $8,000 fell into the cat 
of moderately high relative gross omission rates, 



231 

TABLE 5 . 7 Ethnicity, Type of Return, and Income by Relative Gross Omission 
Rate for a Sample of Income Tax Filers Ages 1 8 to 64, IRS-Census Match ( 1 980) 



Relative Gross 
Omission Rate 
Category 


Ethnicity and Type 
of Return 


Income in 1979, Type of Return, 
and Ethnicity** 


Very high 








High 


Black: Single return 


Under $8 ,000 , single return: Black 




Hispanic: Single return 


Hispanic 






Under $8,000, joint return: Black 






Hispanic 






$8,000-14,999, single return: Black 






$8,000-14,999, joint return: Hispanic 






$15,000 or more, single return: Black 






Hispanic 


Moderately high 


Black: Total filers 


Under $8 ,000, single return: White 




Hispanic: Joint return 


Under $8,000, joint return: White 




Total filers 


$8,000-14,999, single return: Hispanic 




White: Single return 


White 






$8,000-14,999, joint return: Black 


Average 


Black: Joint return 


$ 1 5 ,000 or more, joint return Hispanic 




White: Total filers 




Below average 


White: Joint return 


$8,000-14,999, joint return: White 



$ 1 5 ,000 or more, single return: White 
$ 1 5 ,000 or more, joint return: Black 
White 



NOTE: The average gross omission rate for the 1980 IRS-Census Match was 12.6%. 

a Categories of relative gross omission rates are as follows: 

(1) Very high: greater than or equal to 3 times the average rate. 

(2) High: greater than or equal to 2 times and less than 3 times the average rate. 

(3) Moderately high: greater than or equal to 1.25 and less than 2 times the average rate. 

(4) Average: greater than .75 and less than 1.25 times the average rate. 

(5) Below average: less than or equal to .75 times the average rate. 

fc The three ethnicity categories shown are exhaustive: black non-Hispanic, Hispanic of all races, 
and white and other race non-Hispanic. 

SOURCE: Childers and Hogan (1984a:Tables 1 and 2). 



the remainder fell into the below-average category. For 
Hispanics filing joint returns , the cutting point between 
high relative gross omission rates and average rates was 
an income level of $15,000; however , income level did 
not discriminate among Hispanics filing single returns to 
any great degree. 



Findings From Previous Census Coverage Evaluation Programs 

Coverage evaluation programs for previous censuses provide 
additional information about groups in the population 



232 

TABLE 5.8 Household Relationship by Relative Gross Omission 
Sample of Persons, CPS-Census Match (1960) 

Relative Gross Omission Rate Category Household Rel 

Very high Brother- or sisl 

Group quarters 

High Son- or daughl 

Other relative 
Nonrelative 
Grandson or g 

Moderately high Relationship ni 

Mother or fath 
Mother- or fart 
Brother or sisti 

Average Son or daughte 

Head 

Below average Wife 

NOTE: The average gross omission rate for the 1960 CPS-Census Match was 6.59? 

a Categories of relative gross omission rates are as follows: 

(1) Very high: greater than or equal to 3 times the average rate. 

(2) High: greater than or equal to 2 times and less than 3 times the average rate. 

(3) Moderately high: greater than or equal to 1.25 and less than 2 times the aver 

(4) Average: greater than .75 and less than 1.25 times the average rate. 

(5) Below average: less than or equal to .75 times the average rate. 

SOURCE: Bureau of the Census (1964a:Table 19). 



that are more apt to be undercounted compared with 
groups. This appendix reviews the findings of pos 
enumeration surveys and resident observation but d 
discuss demographic analysis. The chapter text re 
the estimates of net undercoverage provided by deir 
analysis for the 1950, 1960 , and 1970 censuses. 



Post-enumeration Surveys 

Post-enumeration surveys conducted in previous 
provide data on relative rates of undercoverage fo 
ous population groups. Tables 5.8 through 5.11 sh 
relative gross omission rates for the population c 
ized along several dimensions from the results of 
match of the April 1960 Current Population Survey 
census records and of the match of the Post-Enumer 
Survey conducted in summer 1950 to 1950 census reo 
(The dimensions shown were chosen to try to presen 
estimates based on large enough sample sizes for r< 
ity.) The relative gross omission rate experience 



233 



TABLE 5.9 Employment Status by Sex and Income by Race by Relative Gross 
Omission Rate for a Sample of Persons, CPS-Census Match (1960) 



Relative Gross 
Omission Rate 
Category 


Sex and Employment Status 
(persons 14 years and over) fr 


Race and Income in 1959 (males 14 
years and over with income) c 


Very high 










High 


Female: 


Unemployed 


Nonwhite: Income under $7,500 




Male: 


Agricultural (Ag.) 


Total male 14 years and 






wage worker 


over with income 


Moderately high 


Male: 


Not in labor force 


Nonwhite: Income $7,500 or more 






Unemployed 


White: Income under $5,000 


Average 


Female: 


Total 


White: Income $5,000-9,999 






Nonag. wage worker 


Total male 14 years and 






Nonag. self-employed 


over with income 






Not in labor force 






Male: 


Total 








Nonag. wage worker 




Below average 


Male: 


Ag. self-employed 


White: Income $10,000 or more 






Nonag. self-employed 





NOTE: The average gross omission rate for the 1960 CPS-Census Match was 6.5%. 

a Categories of relative gross omission rates are as follows: 

(1) Very high: greater than or equal to 3 times the average rate. 

(2) High: greater than or equal to 2 times and less than 3 times the average rate. 

(3) Moderately high: greater than or equal to 1 .25 and less than 2 times the average rate. 

(4) Average: greater than .75 and less than 1 .25 times the average rate. 

(5) Below average: less than or equal to .75 times the average rate. 

b Groups not shown because of small sample size are male agricultural and nonagricultural unpaid 
worker; female agricultural wage worker, self-employed, and unpaid worker; and female non- 
agricultural unpaid worker. 
C A11 income amounts are in 1979 dollars (1959 figures times 2.5). 

SOURCE: Bureau of the Census (1964a:Tables 28 and 43). 



the entire population was 6.5 percent in the 1960 CPS- 
Census Match and 2.2 percent in the 1950 Post-Enumeration 
Survey. (The lower rate for 1950 attests to the defici- 
encies of a "pure" post-enumeration survey in which 
enumerators are sent out to recanvass an area.) 

On the dimension of household relationship, both 1960 
and 1950 data support the findings from the 1980 PEP, 
namely that persons not belonging to the nuclear family 
were harder to find than household heads, spouses, and 
their children. Nonrelatives including residents of 
group quarters were particularly difficult to count (see 
Tables 5.8 and 5.10) . 

Looking at relative gross omission rates by extent of 
education, data from 1950 indicate that persons with 
education not reported exhibited a very high relative 
gross omission rate, while persons with 6 or fewer years 



234 



TABLE 5.10 Household Relationship and Education Level by Rela ve Gross 
Omission Rate for a Sample of Persons , Post-Enumeration Survey-Cen is Match 
(1950) 



Relative Gross 
Omission Rate 
Category* 


Household Relationship 


Education Level 
(persons 25 years and over 




Very high 
High 
Moderately high 
Average 

Below average 


Nonrelative 

Other relative 
Head 
Wife 
Son or daughter 


Education not reported 

6 or fewer years of school ( 
More than 6 years of schoc 


rnpleted 
completed 



NOTE: The average gross omission rate for the 1950 Post-Enumeration Survey-Censu Match was 
2.2%. 

a Categories of relative gross omission rates are as follows: 

(1) Very high: greater than or equal to 3 times the average rate. 

(2) High: greater than or equal to 2 times and less than 3 times the average rate. 

(3) Moderately high: greater than or equal to 1.25 and less than 2 times the avera ; rate. 

(4) Average: greater than .75 and less than 1.25 times the average rate. 

(5) Below average: less than or equal to .75 times the average rate. 

SOURCE: Bureau of the Census (1960:Tables E and 4). 



of schooling exhibited a moderately high rate. The gross 
omission rate for persons with more than 6 years of 
schooling fell into the average category (see Table 5.10). 
Comparable data are not available from 1960. 

Both the 1960 and 1950 census coverage evaluatio 
programs furnish data on differential rates of und count 
among the population classified by labor force stat s, 
occupation, and income levels (see Tables 5.9 and 5 11). 
In I960, unemployed women had a high relative gross 
omission rate, and men who were unemployed or not i the 
labor force had moderately high rates. Looking at 
employed persons, male agricultural paid laborers h d a 
high relative gross omission rate, while the rate w s 
below average for self-employed men both in farming and 
other lines of business. 

On the dimension of income, looking only at male , 
there is a clearer picture for whites compared with 
nonwhites in 1960. In the case of white males, tho e 
with low income had a moderately high relative rate :>f 
gross omissions, while those with high income had a 
below-average rate. In contrast, income did not di - 
criminate to any important extent among nonwhite ma as. 



235 

TABLE 5.11 Occupation by Sex and Income by Relative Gross Omission Rate 
for a Sample of Persons, Post-Enumeration Survey-Census Match (1950) 



Relative Gross 
Omission Rate 
Category 


Sex and Occupation 
(persons 14 years and over)* 7 


Income in 1949 
(males 14 years and over) c 


Very high 


Female: 
Male: 


Farm laborer and 
unpaid worker 
Farm laborer 





High 


Female: 


Private household 
worker 


Income not reported 




Male: 


Nonfarm laborer 




Moderately high 

Average 
Below average 


Female: 
Male: 

Female: 
Male: 
Female: 
Male: 


All other occupations^ 
Farm unpaid worker 
Not employed 
Not employed 
All other occupations* 
Sales worker 
Farmer and farm 


Income under $3,000 

Income $3,000-10,499 
Total male 14 and over 
Income $10,500 and over 






manager 





NOTE: The average gross omission rate for the 1950 Post-Enumeration Survey-Census Match was 
2.2%. 

a Categories of relative gross omission rates are as follows: 

(1) Very high: greater than or equal to 3 times the average rate. 

(2) High: greater than or equal to 2 times and less than 3 times the average rate. 

(3) Moderately high: greater than or equal to 1.25 and less than 2 times the average rate. 

(4) Average: greater than .75 and less than 1.25 times the average rate. 

(5) Below average: less than or equal to .75 times the average rate. 

b Excludes male private household worker, female farmer, and female nonfarm laborer categories 

because of small sample size. 

C A11 income amounts are in 1979 dollars (1949 figures times 3). 

d Includes professional and technical, nonfarm manager, clerical, crafts, operative, and service 

worker categories. 

e Includes categories listed in note d plus sales worker category. 

SOURCE: Bureau of the Census (1960:Tables 6A, 6B, and 9B). 

The 1950 data, which do not include race breakdowns, 
support the general patterns evident in 1960 on occupation 
and income. Persons with very high relative rates of 
gross omissions include farm laborers of both sexes and 
female unpaid farm workers. In contrast, male farmers 
and also female sales workers had below-average rates 
(female farmers were excluded because of very small 
sample size) . Male nonagricultural laborers and female 
private household workers fell into the next category of 
high relative gross omission rates. On the dimension of 
income, low income is associated with a moderately high 
relative rate of gross omissions, while high income is 
associated with a below-average rate. Note that persons 
with various characteristics, such as income, not reported 



236 

in the 1950 study tended to have very high or high re 
tive gross omission rates , indicating that these pers 
represented a generally hard-to-count group. 

Finally , tables are not shown by type of area or r 
of the country, as these dimensions did not discrimir 
significantly in either 1960 or 1950 on relative rate 
gross omissions. In 1960 the population of urban arc 
rural nonfarm, and rural farm areas all exhibited ave 
relative gross omission rates (Bureau of the Census, 
1964a:Table 8) . In 1950 persons living in urban area 
had an average rate of gross omissions, persons 
classified as rural nonfarm had a moderately high 
relative rate, and persons classified as rural farm I 
below-average rate. By region of the country in 195 C 
the Northeast, North Central, and West regions had 
average rates, while only the South had a moderately 
relative rate of gross omissions. Within the South, 
rural nonfarm areas were hardest to count, with a hig 
relative rate of gross omissions (Bureau of the Censi 
1960 : Table F). 

Comparable data were not published for 1970, but i 
lished data from a match of the April Current Populat 
Survey to the census records indicate the following j 
terns. (These findings should be viewed as suggest^ 
only, however, because of high variances associated v 
the estimates.) First, employed whites had a below- 
average gross omission rate, but employed blacks had 
moderately high relative rate. Similarly, higher-inc 
whites had a below-average gross omission rate, but t 
was not true for higher-income blacks. Unemployed ar 
low-income persons of both races had average rates oJ 
gross omissions. Finally, rates of gross omissions v 
higher in rural than in other types of areas (Siegel, 
1975:8-9). 



Resident Observer Studies 

The techniques of resident observation employed ir 
ethnographic studies were used on one occasion to stu 
factors affecting the coverage of household surveys. 
Charles A. Valentine and Betty Lou Valentine, who wer 
trained anthropologists, conducted a resident ethnogi 
study of a predominantly black inner-city community i 
1968-1970, partially under the sponsorship of the Cer 
for Research in Measurement Methods of the Census Bur 
(Valentine and Valentine, 1971). Interviewers for tl 
Current Population Survey, the Health Interview Surve 



237 

and the Quarterly Housing Survey conducted interviews in 
the area at the time when the Valentines had been in 
residence for approximately one year. The interviewers 
were unaware that the Valentines were studying the area. 
The Valentines independently identified the residents of 
a number of households in the area and ultimately compared 
their independently derived data on household composition 
for a total of 25 dwelling units with the corresponding 
data as reported by the survey interviewers. About 
three-fourths of the households were black and one-fourth 
Hispanic. The families lived in substandard housing, 
some lacking basic facilities* All families were judged 
to have very low incomes relative to the cost of living 
for the area* The Valentines described the community as 
w a typical polyethnic inner-city slum. 81 

For the 25 households, the surveys reported a total of 
127 individuals , whereas the Valentines identified 153 
individuals as being associated with the dwelling units. 
Therefore, the survey procedures produced a 17 percent 
undercount relative to the count obtained by the resident 
observers. The most striking result was that the ethno- 
graphic evidence suggested that 61 percent (17 of 28) of 
the males age 19 and older were not counted by the survey 
procedures. The Valentines described the missed men as 
regularly residing in the households. The men contributed 
to financial support, took part in domestic activities, 
and shared in child-rearing. In nine of the households, 
the men were legally married and living with their 
spouses. The remaining common-law unions were relatively 
permanent and most were intact two years after the study. 
The Valentine estimates provided a much more realistic 
sex ratio than did the interview results. 

The Valentines described a number of reasons that led 
them to believe that one cannot expect traditional inter- 
view or self -enumeration procedures to identify indi- 
viduals of the type missed in the study area. The 
Valentines felt that the respondents understood the 
questions. They concluded that the men were not reported 
because the identification of resident males in the 
households could be detrimental to the economic welfare 
of the household and that the respondents behaved in a 
consistent manner in failing to report these men. 



GROSS OVERENUMERATIONS OF PEOPLE 
Findings From the 1980 PEP Program 

The whole story regarding coverage problems in the c 
does not emerge solely by looking at gross omission! 
is necessary to examine gross over enumerations as w 
omissions to obtain a complete picture. The PEP de^ 
estimates of gross over enumerations in the 1980 cens 
through rechecking a sample of census records (the ] 
sample) to identify problems such as duplicate recoi 
persons enumerated who were not alive on Census Day 
so on. As is true for the PEP estimate of gross om] 
sions, the estimate of gross over enumerations is ov< 
stated. Also, the rates cannot be subtracted to gi\ 
estimate of the net undercount, because they have d; 
ent denominators (see Cowan and Fay, 1984, for furtl 
explanation) . 

Data from the PEP 3-8 series (which is the only j 
for which tabulations for population groups are curi 
available) indicate an overall gross overenumeratioi 
of 3.6 percent in 1980 versus an overall gross omiss 
rate of 5.4 percent. It is the case that populatioi 
groups with relatively high gross omission rates als 
tended to have relatively high relative rates of gr< 
overenumerations. However, the dispersion in gross 
enumeration rates is less than the dispersion in gr< 
omission rates. 

Table 5.12, as an illustration, shows relative r< 
of gross overenumeration for ethnicity and householc 
relationship in 1980. Blacks and Hispanics on aver< 
had moderately high relative rates of gross overenun 
tions, as did members of other races. American Ind: 
Asians, white non-Hispanics , and some categories of 
Hispanics, in contrast, were in the average category 
This pattern is similar to the pattern evidenced in 
5.5 for relative gross omission rates, but the dispe 
is less for gross overenumerations. Similar finding 
emerge for categories of household relationship: he 
hold members outside the nuclear family had higher r 
of both gross overenumerations and gross omissions c 
pared with nuclear family members but were not as ba 
over enumerated relative to the average as they were 
underenumerated (see Table 5.4) . 

The rate of gross overenumerations also varied b} 
of enumeration procedure. Enumerations obtained in 
areas by follow-up because the questionnaire was not 



239 

TABLE 5.12 Ethnicity and Household Relationship by Relative Gross Over- 
enumeration Rate for a Sample of Persons, Post-Enumeration Program-Census 
Match (1980, PEP Series 3-8) 

Relative Gross 

Overenumeration Rate Ethnicity 

Category (detailed categorization) Household Relationship 

Very high 

High 

Moderately high Black (non-Hispanic) Nonrelative 

Hispanic: Total Other relative 

Cuban Brother or sister 
Puerto Rican 

Other race (non-Hispanic) 

Average American Indian Head 

Asian Spouse 

Hispanic: Mexican Son or daughter 

Other Mother or father 
White (non-Hispanic) 

Below average 

NOTE: The average gross overenumeration rate for the 1980 PEP was 3.6%. 

a Categories of relative gross overenumeration rates are as follows: 

(1) Very high: greater than or equal to 3 times the average rate. 

(2) High: greater than or equal to 2 times and less than 3 times the average rate. 

(3) Moderately high: greater than or equal to 1.25 and less than 2 times the average rate. 

(4) Average: greater than .75 and less than 1 .25 times the average rate. 

(5) Below average: less than or equal to .75 times the average rate. 

SOURCE: Hogan (1983b:2-3). 



mailed back showed a high relative rate of gross over- 
enumerations, while enumerations resulting from mail 
returns or obtained in conventional areas had below 
average rates (Cowan and Fay, 1984:6). It is not clear 
how much of the gross overenumeration was due to actual 
double counting and other kinds of erroneous enumerations 
in the census as opposed to problems with unresolved 
cases in the E sample* These problems are known to have 
been worse for groups exhibiting above-average gross 
overenumeration rates. 



Findings from Earlier Censuses 

The 1970 census relied on demographic analysis as the 
primary method for estimating net under cover age? the 
CPS-Census Match provided estimates of gross omissions of 
persons but not of gross over enumerations. The 1960 Post- 
Enumeration Survey determined gross over enumerations of 



240 

persons and housing units as well as gross 01 iss 
only net undercoverage rates were reported, hil 
1960 CPS-Census Match determined only gross < nis 
(see Chapter 4 for further discussion) . 

Only the 1950 Post-Enumeration Survey , of >re 
census coverage evaluation programs , reporte< th 
nents of net population coverage error. As v is 
the 1980 PEP, the 1950 gross over enumeration ist 
0.9 percent is overstated, as is the gross on ss 
estimate of 2.2 percent, because the estimate : i 
persons counted in the wrong geographic locat on 
Bureau of the Census, 1960). Findings with r ga 
gross overenumerations in 1950 are less clear cui 
the findings for 1980. In general, most popu at 
groups fell into the same categories of relat ve 
over enumeration and gross omission rates. So e < 
appeared to have been less often overenumerat* 3 1 
under enumerated relative to the average rates (a; 
general pattern in 1980) , while a few groups < >pc 
have been more often overenumerated than unde: sni 
The very different enumeration procedures usec ir 
1950 and 1980 censuses make it difficult to c< np< 
overenumeration experiences. 



HOUSING COVERAGE STUDIES 

Another source of information on relative rate c 
omissions in the census is provided by studies Df 
pleteness of coverage of housing units conduct d 
census since 1950. Of course, rates of omissi< i 
housing units do not necessarily translate int c 
rates of missed persons? nevertheless, data on :h 
characteristics of missed units add to the pic jr 
hard-to-count elements in the population. Hous .n 
coverage evaluation studies also provide infori *t 
gross overenumerations, although the estimates r r 
1980 census evaluation are not comparable with *s 
from previous censuses because of the use of di f< 
methods. 

Looking at missed units, the 1950 census est m 
overall gross omission rate for occupied housin \ 
3 percent? the estimated rate was 2.1 percent i 
1.4 percent in 1970, and 1.5 percent in 1980. 

Data from 1950 on gross omissions of occupie ] 
units show that rented units as a group exhibit* 3 
ately high relative gross omission rate, while . m 



241 

units had a below-average rate. Within the rental cate- 
gory r units for which rent was not reported and with very 
low monthly gross rent had very high relative gross omis- 
sion rates? in contrast, moderate- to-expensive units fell 
into the average category. The smallest units (only one 
room) and also units with number of rooms not reported 
exhibited very high relative rates of gross omissions, 
while the largest units (five or more rooms) had a below- 
average rate. Finally, close to 30 percent of missed 
occupied units were in buildings that were otherwise 
enumerated, while 70 percent were in buildings that were 
missed entirely (Bureau of the Census, 1960s Tables I, 11, 
and 15) . 

By region of the country or type of place (urban versus 
rural or metro versus nonmetro) , there were no important 
differences in relative rates of gross omissions for 
occupied housing units in 1950 (Bureau of the Census, 
1960:Table K) . This was also true for 1960, 1970, and 
1980. In 1960, the South had a moderately high relative 
rate of gross omissions, as did areas outside SMSAs. in 
1970, the South 1 s rate of gross omissions fell into the 
average category, while nonmetropolitan areas remained in 
the moderately high category. However, without special 
coverage efforts in the South in 1970, specifically a 
post-enumeration post office check of the address list, 
the South would have had a moderately high relative rate 
of gross omissions (Bureau of the Census, 1973c:Tables F 
and G) . In 1980, rural areas had a moderately high 
relative rate of gross omissions and the West region and 
areas enumerated with conventional methods had below- 
average rates (Bureau of the Census, 1985a:Table 2) . 

In 1960 about 40 percent and in 1970 about 30 per- 
cent of missed occupied units were in buildings that were 
otherwise enumerated, with the remainder in structures 
that were missed entirely. Table 5.13 shows the per- 
centage distribution of missed units by the enumeration 
status of the structure for type of place in 1960 and 
1970 (comparable data are not available for 1980). A 
clear shift is evident from 1960 to 1970 in the distribu- 
tions by area type, presumably due to the introduction of 
new procedures for developing address lists for using 
mailout-mailback enumeration procedures in 1970* The 
shift is toward a higher percentage of missed units 
within otherwise enumerated structures in central city 
areas in 1970 compared with 1960 and lower percentages 
for other metropolitan areas and areas outside SMSAs. 
Data not shown indicate that in 1970 one-half of the 
units missed within structures were in structures 



242 

TABLE 5.13 Percentage of Gross Omissions by Enumeration Status of 1 
Building and Type of Area, for Samples of Occupied Units, CPS-Census Mai 
(1970 and 1960) 

1960 Percentage of 1970 Percentage of 

Occupied Units Missed Occupied Units Missed 
in Buildings in Buildings 



Type of Area Enumerated Missed Enumerated Missed 



Total* 


38.1 


61.9 


28.6 


71.4 


Inside SMSA* 


47.4 


52.6 


46.2 


53.8 


Central city 


54.5 


45.5 


66.7 


33.3 


Other 


33.3 


66.7 


25.0 


75.0 


Outside SMSA C 


27.6 


72.4 


10.0 


90.0 



a !970 percentages are calculated from Table G, Part A, "After Processing," based on the I 

CPS-Census Match and assuming that processing changes reduced the miss rate in mis 

buildings but not in enumerated buildings. 

b 1970 percentages are calculated from Table F, Part B , based on the 1970 Coverage Evaluatioi 

Mail Areas. 

c 1970 percentages are calculated from Table F, Part A, "After Processing," based on the 1970 C 

Census Match and assuming that processing changes reduced the miss rate in missed buildings 

not in enumerated buildings. 

SOURCE: Bureau of the Census (1973c:calculated from Tables F and G). 



classified as single-unit addresses on the mailing list, 
with another one-third in structures classified as havin 
two to four units. Three- fourths of occupied units miss 
in 1970 were in structures built before 1939 (Bureau of 
the Census, 1973c:17). 

Data from a study of housing units in the E sample oi 
the 1980 census Post- Enumeration Program that contained 
at least one duplicated person provide an estimate that 
0*9 percent of units were duplicated. This estimate/ 
although based on methodology that the Census Bureau 
believes to be superior to the methodology used in pre- 
vious censuses, is an underestimate because it excludes 
various other kinds of housing unit overenumerations . 
Looking at relative duplication rates, the South had a 
moderately high rate of housing unit duplications in 198 
as did mail areas where the address list was developed t 
Census Bureau staff (prelist areas), rural areas, and 
nonmetropolitan areas. Conventionally enumerated areas 
and the Midwest had below-average housing unit duplicati 
rates (Bureau of the Census, 1985a:Table 4). The 1980 
study estimated (Bureau of the Census, 1985a:30) that in 
88 percent of the duplicated units the entire household 
was duplicated, while in the remaining duplicated units 
only some household members were duplicated. 



6 

Taking the Census II: 

The Uses of Sampling and 

Administrative Records 



The charge to the panel called for assessment of the uses 
of sampling and administrative records to improve the 
cost-effectiveness of the decennial census. Recent 
census methodology has incorporated both of these tech- 
niques into one or more aspects of census operations, but 
there may well be room to extend their use. This chapter 
evaluates a range of uses of sampling for obtaining the 
count and characteristics and also considers the joint 
use of administrative records and sampling to improve the 
quality of certain census items. 



SAMPLING IN THE CENSUS 

Sampling has been employed since the 1940 census to obtain 
additional useful data without the burden and expense of 
asking all questions of the entire population. Sampling 
has also been used as part of census operations for pur- 
poses of quality control and has been extensively used in 
postcensal programs of coverage and content evaluation . 
Recently, it has been suggested that sampling could prove 
cost-effective in helping to fulfill the basic purpose of 
the decennial census obtaining the population count for 
the nation, states, and small areas. 

The panel examined the merits of the following poten- 
tial uses of sampling for the count: (1) taking a sample 
census, (2) conducting follow-up of a sample of households 
that do not return their questionnaires, and (3) imple- 
menting coverage improvement programs for hard-to-count 
areas and population groups on a sample basis. The panel 
also reviewed issues in sampling for content, including 
criteria for deciding when to include questions on the 
short form administered to 100 percent of the population 

243 



244 

and when to include questions on one or more 1 n 
administered to a sample* The panel also cons .c 
merits of a follow-on sample survey to obtain d 
information,, Finally, the panel reviewed the 
sampling in conjunction with administrative re 
verification and improvement of the quality oi 
items collected in the census* Sampling is al 
in Chapter 8 in the context of coverage evaluc 
methods * Because we believe that the use of s 
an operational context for quality control is 
understood application, we do not comment on t 
of sampling in the census, despite the great j 
of careful control of all aspects of census pi c 

The panel reviewed two papers prepared by s :c 
Census Bureau outlining proposed research on u .e 
sampling for the head count and content in 195 ) 
paper by Miskura et al. , "Uses of Sampling for t 
Count 11 (1984) , describes four applications of la 
for the decennial census and proposes research p 
for each type of uses (1) obtaining the censi 
a sample basis, (2) using sampling for follow- .p 
nonresponse in the census r (3) using sampling :c 
improvement operations, and (4) using samplinc f 
verification and possible correction of speciJ .c 
items during the census* 

A package of papers prepared in summer 198< 
Census Manager Reports on 1986 Pretest Objecti e 
(Johnson, 1984) , describes proposals for the i 
1990 census pretests to be conducted in sprinc 
Proposed tests that involved the use of samplj 
count or content included: (1) a split panel 
sampling for unit nonresponse follow-up, (2) s 
the use of sampling for one of the coverage in 
programs the Vacant/Delete Check, and (3) tes 
general-purpose follow-on survey of 1-2 percen 
short-form recipients conducted a few months 
completion of the census enumeration. However 
Census Bureau has dropped the first two projec .s 
from the current 1986 pretest plans (see Burea < 
Census, 1985b) . 

Any evaluation of the costs and benefits oi a 
ticular sampling procedure must endeavor to as es 
gains or losses on several dimensions compared w, 
alternate procedure. The comparison procedure cc 
complete enumeration or another variant of san 1: 
example, the use of a larger or smaller sampli g 
or a different sample selection procedure. Cr t< 
considered by the panel include: 



245 

(1) Accuracy of the information obtained . Errors in 
surveys traditionally are thought of as having two com- 
ponents: sampling and nonsampling. Sampling error is 
inherent in sample surveys and necessarily increases the 
random variation of observed values from true values, 
compared with a complete enumeration* Nonsampling error 
may arise from question wording, field techniques, or 
many other sources and can occur both in samples and in 
complete enumerations It is possible that a well- 
designed and executed sampling operation can reduce 
nonsampling error compared with a complete enumeration 
because the staff may be better trained and procedures 
more uniformly applied. It is, of course, also possible 
for the sample survey design to introduce nonsampling 
error. Furthermore, certain components of nonsampling 
error appear as variances that decrease with increasing 
sample size. 

(2) Cost. In the context of the decennial census, 
which cost over 31 billion in 1980, the cost impact of 
any proposed methodology is an important consideration. 
Sampling is usually expected to reduce costs compared 
with a complete count, and small samples are expected to 
cost less than large samples; however, this is not always 
the case. The use of sampling introduces costs associated 
with sample design, selection of the sample, quality con- 
trol of the sampling operation, processing of the data to 
estimate universe totals, and assessment of the quality 
of the information obtained. 

(3) Timing. The length of time required for an opera- 
tion is important in the census context. Shortening the 
time between Census Day and completion of the enumeration 
has positive implications for cost savings, for earlier 
availability of the data, and for improved accuracy of 
the numbers. (For example, the shorter the field opera- 
tion, the less opportunity there is to miscount movers.) 
Sampling may have the benefit of reducing the calendar 
time required to complete the census field work. 

(4) Feasibility. The enormous scale of census opera- 
tions places a high premium on the feasibility of proposed 
methodologies in the field. Sampling may have drawbacks 
on this dimension if it proves more difficult to implement 
a sample operation than to conduct a complete enumeration. 
Because the census is a massive undertaking conducted 
within a brief time span only once every 10 years, there 
are not the opportunities to refine sampling procedures 
and to train field staff afforded in a continuing sample 
survey. 



246 

(5) Respondent burden Sampling reduces the agg 
time the public must spend filling out questionnaire 
well as the survey costs* Since the decennial censu 
conducted only once every 10 years, the panel does n 
view reducing respondent burden as an important argui 
for increased use of sampling to obtain the basic ce 
counts* However , burden reduction has historically 1 
an important justification for obtaining responses f 
most content items from samples of households. It i: 
possible that greater use of sampling for content, w 
the consequent further reduction in burden 9 could hai 
the benefit of improving the quality of the response 

(6) Legislative and political considerations* 
Although the panel was explicitly instructed to set < 
legal considerations in examining choice of methodolc 
for the decennial census, such considerations cannot 
totally ignored. At present, clear legislative authc 
exists for the Census Bureau to use sampling to obta: 
answers to any and all items on the form, but there j 
question whether this authority would extend to the t 
of sampling in determining population head counts foi 
purposes of congressional reapportionment. 

The decision to adopt a particular application of 
sampling in the decennial census must rest on a cared 
assessment of the net gain or loss (compared with the 
alternatives) on each of the above dimensions. Becai 
an assessment is unlikely to show net pluses on everj 
dimension (or net minuses for that matter) , it will t 
necessary to make trade-offs and to answer hard quest 
such as how much reduction in accuracy is tolerable t 
achieve a specified level of cost savings. Quantific 
of the relative importance of the dimensions is diffi 
In considering proposed changes in methodology, the p 
has attempted to make explicit the degree to which va 
factors are affected. 



SAMPLING FOR THE COUNT 

The panel reviewed several possible applications of 
sampling for the count, ranging from replacement of tl 
census with a large sample survey to the use of sampl 
in the final stages of follow-up. The panel conclude* 
for a variety of reasons, that sampling appears more 
likely to be cost-effective at the end of the census 
process than in the earlier stages. The panel suppor 
further research directed toward evaluating the merit! 



247 

limited use of sampling as part of the census enumeration 
process. 



Taking a Sample Census 

Currently, decennial census methodology involves collect- 
ing the majority of population and housing characteristics 
from a sample of households, who receive the "long-form" 
census questionnaire. (Sample sizes for the long-form 
items in recent censuses have ranged from 3.3 to 50 per- 
cent and are typically 20 or 25 percent.) However, counts 
of persons and housing units and basic characteristics, 
such as age, race, sex, and marital status, are obtained 
from 100 percent of the population. 

The concept of taking a sample census, i.e., taking a 
large sample survey instead of a full census to obtain 
the count of the population and related basic character- 
istics, has been suggested as a means to effect a reduc- 
tion in costs while satisfying the primary information 
needs served by a full census (see, for reference, Bureau 
of the Census, 1982aj Kish, 1979). Kish has also 
suggested, as a variant on the basic concept of a sample 
census, taking "rolling samples, 18 whereby a different 
fraction of households is enumerated each year (Kish, 
1979; Congressional Research Service, 1984:175) . 

Miskura et al. (1984) propose several research projects 
intended to result in a possible design for a sample 
census. These include projects to develop appropriate 
sampling error estimates for alternative designs, to 
develop total error models (including sampling and 
nonsampling error) and to investigate the theoretical 
reduction in nonsampling error required to obtain overall 
accuracy at least equal to that of a complete count, and 
to develop cost models and estimate their parameters for 
a sample census. At present, however, the Census Bureau 
has no plans to proceed with extensive research on a 
sample census, a decision the panel supported in its 
interim report (National Research Council, 1984 :Ch. 2) . 



Problems Involved in a Sample Census 

The panel believes that the concept of replacing the 
census with a large sample survey should be excluded from 
the Census Bureau's 1990 research and testing program for 
a number of reasons that relate principally to census 
purposes, costs, and coverage. 



24S 

The decennial census is the only comprehensj 
of data for very small geographic areas such at 
census tracts, and city blocks (see the discuss 
Chapter 2) . There are important needs for dat< 
small areas , including? redistricting of natic 
and local legislative districts, which requires 
counts by race to meet court-mandated criteria 
size and compactness of districts (Bureau of tl 
no date-b) , and revenue sharing, which requires 
and income data for 39,000 political jurisdict] 
include many very small towns, villages, and sp 
districts. Small-area census data are also us< 
planning and by the private sector for many pur 
Moreover, the model-based estimation techniques 
used to produce small-area data postcensally fc 
sharing and other purposes are recalibrated pei 
against the census. 

To obtain small-area population counts and i 

acteristics from a sample survey to satisfy the 

outlined above with an acceptable level of acci 

require a large sampling rate, probably 50 perc 

greater for small jurisdictions. Moreover, it 

be acceptable to design a clustered area sample 

included the population of only some geographic 

such as selected counties or cities, because sm 

data are needed for every political jurisdictic 

country. Hence, it would not be possible to re 

number of field offices and thereby effect sign 

savings in administrative overhead costs. More 

while the size of the interviewer staff could b 

somewhat, a large sample survey would entail ad 

costs for drawing and controlling the sample. 

to select a large unclustered sample would prob 

require complete address listing. Given a larg 

rate, an unclustered design, and 100 percent ad 

listing, the panel is doubtful that costs could 

significantly reduced in comparison with a full 

The panel has reviewed Census Bureau cost es 

prepared in the mid-1970s for conducting a mid-< 

census on a sample basis compared with a comple 

tion. These estimates appear to bear out the c< 

that there would be only minor cost savings in 

on the scale necessary for satisfaction of pres< 

for small-area data (see Appendix 6.1 for detai 

Finally, there is the issue of completeness < 

obtained by a large sample survey compared with 

census. There is a large body of evidence in b< 



249 

United States and other countries that the census obtains 
more complete population coverage than even the best- 
executed sample survey (Redfern, 1983? Yuskavage et al. , 
1977). Furthermore, the coverage deficiency of sample 
surveys relative to censuses affects differentially 
precisely those population groups that are least well 
counted by the census in the first place. In fact f even 
the samples taken in conjunction with the census generally 
produce lower population figures than the complete census 
(Waksberg et al., 1973), One possible reason for this 
finding is that the publicity surrounding a census elicits 
greater cooperation from the public than can be obtained 
in surveys. While , of course , the Census Bureau would 
mount a publicity campaign for a sample census, it would 
be difficult to include a question like "Were you 
counted?" when only a fraction of the population is 
supposed to respond. Similarly, the field operations of 
a census, including follow-up and special coverage 
improvement programs, are geared toward finding every 
housing unit and person and adding missed units to the 
address list developed in advance of the census. For a 
sample census, it is unlikely that the same effort would 
or could be put into adding units to the sampling frame , 
and less complete coverage may result. 

The less complete coverage obtained by a sample census 
compared with current methodology would have adverse 
implications for many important uses of census data. 
Concerns about inequities resulting from differential 
under cover age of important subgroups of the population 
are already very strong. Substituting a large sample 
survey for the census would deepen these concerns still 
further and probably with every good reason, given, as 
we noted before, that sample surveys appear to undercount 
even more disproportionately precisely those population 
groups already disproportionately undercounted by the 
census. The decennial census is also used as the basis 
for the design of current surveys in both the public and 
private sectors and to benchmark current population esti- 
mates. Less complete coverage would adversely affect 
these uses of census information as well. 

We believe that rolling samples would also suffer from 
the disadvantages just discussed for a large-scale decen- 
nial sample survey compared with a complete enumeration, 
namely less complete coverage and either significantly 
reduced reliability of small-area data or only modest 
cost savings. Rolling samples may offer some advantages, 
such as improved ability to recruit and retain high- 



250 

quality field staff, but have the added disadvantag tfc 
data are not available for comparative analysis acr ss 
areas and population groups at a point in time. As 
described in Chapter 2 f many uses of census data, i sli 
ing redistricting, fund allocation, and public poli< 7 
analysis, depend on cross-sectional measures. 

Recommendation 6.1. We recommend that the Censu 
Bureau not pursue research or testing of a sampl 
survey as a replacement for a complete enumeratic i i 
1990. 



The Use of Sampling for Follow-up 

Another proposed use of sampling for the count is t< 
sample in the follow-up stage of census operations c ; i 
means of reducing costs (Bureau of the Census, 1982c 
1983ay Ericksen and Kadane, 1985 ; General Accounting 
Office, 1982). A census carried out with the use oi 
sampling for follow-up could, for example, at a spec .f: 
date after Census Day, draw a sample of addresses fr m 
which a completed census form had not been returned .n< 
follow up only those addresses. The total number oi 
housing units and persons represented by the cases t ,a1 
were followed up would then be estimated and added t t 
number that returned questionnaires in the mail. Tfc 
Miskura et al. paper outlines research projects inte d 
to provide a sound methodological basis for designir 
follow-up operations to be carried out for a sample f 
nonresponding units. These projects are similar to h< 
proposed in connection with conducting the entire ce si 
on a sample basis, namely to develop sampling error 
estimates and total error models for alternative sam 1J 
designs* These research endeavors were expected to e< 
to a pretest of sampling for follow-up and such a pr t 
was included in the Census Bureau's initial plans fo 
1986 (Johnson, 1984). The test would have used a sp il 
panel design, whereby census field staff in half the 
enumeration districts would follow up every househoL r 
returning a questionnaire, but follow up only a samp e 
nonresponding households in the remaining districts. 
Unfortunately, given the realization that not all ob s< 
tives could be tested with a limited number of sites i 
Census Bureau decided that other objectives took nig! 21 
priority and dropped the test of sampling for follow- i] 
in 1986. 



251 

Problems Involved in Sampling for Follow-up 

The panel believes that the use of sampling for 
follow-up has some of the same drawbacks as the use of 
sampling for the entire census* The Miskura et al and 
Johnson documents properly observe that, for sampling in 
follow-up operations to be effective, increases in total 
error (sampling plus nonsampling errors) must be counter- 
balanced by comparable cost savings . Because a heavily 
clustered design could not be used, given that follow-up 
operations must be carried out in every geographic area, 
there would be little opportunity to effect sizable sav- 
ings by eliminating entire segments of field operations. 
Moreover, there would be the added costs of drawing and 
controlling the sample. The possibilities of confusion 
caused by a large sampling operation concurrent with the 
census should not be underestimated. For example, mail 
returns will come in after the cutoff date for drawing 
the follow-up sample and would introduce practical field 
problems and problems of integrating the late returns 
with the sample. Careful attention would need to be 
given to the sample design and determination of sampling 
fractions, given the likelihood of large variations in 
initial mail response rates across geographic areas. For 
example, in 1980, Madison, Wisconsin, had a mail return 
rate of over 90 percent, while the rate for the central 
Brooklyn district office was about 55 percent (Ferrari 
and Bailey, 1983s59). Carrying out follow-up operations 
on a sample basis would also add problems for coverage 
improvement and coverage evaluation programs that in- 
volved matching individual records. Furthermore, because 
low mail return rates very often characterize areas with 
relatively high coverage errors, sampling at this stage 
would probably introduce the largest sampling error into 
those estimates that already suffer from the largest 
coverage errors. 



Sampling in the Final Stages of Follow-up 

In light of the fact that it is never possible to 
obtain a 100 percent follow-up, there may be reason to 
believe that sampling could prove cost-effective in the 
final stages of follow-up operations. It is well known 
that the cost to count an additional person rises sharply 
as one moves toward those people who are harder to locate. 
That is, the per case cost to enumerate people requiring 



252 

multiple follow-ups or special coverage efforts is mai 
times the per case cost for those persons who mail ba< 
their questionnaires (Keyfitz, 1979? National Researcl 
Council, 1978? see also Chapter 5). 

The administrative and recordkeeping problems asso- 
ciated with the use of sampling are much smaller if 
sampling is used only at later stages of follow-up. I 
example, it is anticipated that a much smaller fracti< 
of persons in the final follow-up pool would subsequei 
return their census forms by mail. Certainly the tot< 
number of individuals for whom records are required it 
smaller if sampling is restricted to the final stages 
follow-up. Therefore, the selection of the sample an< 
recordkeeping could be handled by a smaller number of 
higher -level Census Bureau employees. 

There is also the possibility that the use of samp: 
in the later stages of follow-up could lead to a decre 
in the nonsampling component of error that would exce< 
the error introduced by sampling, thus resulting in a 
decrease in total error. We can imagine a situation 
some regional offices in which the personnel who are 
involved in final stage follow-up operations vary gre< 
in their abilities to elicit accurate information froi 
the nonresponding units. Total error may be reduced 
rather than using the whole field force in follow-up, 
only those interviewers with superior skills are empl 
in a probability sample of the final follow-up cases, 
the extent that field personnel have differential ski 
levels and there is reason to believe that qualified 
dedicated personnel are becoming increasingly difficu 
to hire and retain (Hill, 1984) this approach might 1 
payoffs. 



Determining the Final Stages of Follow-up 

In the 1980 census, the first stage of follow-up f< 
nonresponding households called for enumerators to mal 
as many as four attempts to locate the residents. If 
one could be found but the housing unit appeared to b< 
occupied, the enumerators were instructed to obtain b 
information from other persons, such as neighbors, 
resident managers, and the like. Census Bureau field 
staff estimate that as many as 98 percent of househol< 
were enumerated by the end of this first stage. The 
second phase of follow-up included attempts to locate 
remaining 1 or 2 percent of nonrespondents and implem< 



253 

tation of special coverage improvement programs such as 
the Vacant/Delete Check and Nonhousehold Sources Program 
discussed in Chapter 5. This second stage also included 
follow-up of households whose questionnaires had an 
unacceptable rate of missing data. 

To obtain appreciable cost savings from sampling in 
the last stages of follow-up , it may be necessary to 
restructure the first and second stages. One possible 
scenario could be to restrict the first stage to perhaps 
two attempts to locate nonrespondents . The second stage 
could then encompass follow-up on a sample basis of the 
remaining nonresponding households, which would represent 
a larger fraction of all households than the second phase 
of the 1980 operation. Clearly, more study is needed 
before recommendations could be formulated. 

It would also be possible, as discussed further below, 
to carry out the checking of vacant units on a sample 
basis in a combined operation with the second stage 
follow-up of nonrespondents. (in fact, the checking of 
vacant units is a particular type of follow-up.) 
Appendix 6.2 presents an illustrative scenario and gives 
crude estimates of possible cost savings. 

Restructuring the first and second stages of follow-up 
in this manner and using sampling for the second stage 
could have beneficial effects on the quality of the 
data. The 1980 census procedures did not include any 
special quality control measures for households enumer- 
ated in the first follow-up stage based on responses of 
other persons such as neighbors (called "last resort" 
cases). If, after a limited number of initial follow-up 
attempts, sampling were initiated with higher-level staff 
and more stringent quality-control measures, there is the 
possibility that better data could be obtained in the 
second stage for a larger proportion of households. 



The Merits of Research on Sampling 

On balance we doubt that sampling the entire pool of 
nonresponding households for follow-up will prove cost- 
effective, but we believe there may be important benefits 
from the use of sampling for households that do not 
respond after one or two follow-up attempts. We urge the 
Census Bureau to study the feasibility of sampling and to 
estimate components of total error in the 1987 cycle of 
pretests. We also advise that maximum use be made of 
information that can be extracted by simulating sampling 



254 

with data from the 1985 and 1986 pretests. The an 
should attempt to identify stages of follow-up (fi 
round, second round , etc*) and, for each stage, de 
cost structures and patterns of response , comparin 
across different sized geographic areas and areas 
differing initial mail response rates* In additio 
suggest that the Census Bureau investigate methods 
making the most effective use of field staffs with 
varying skills and determine if there are new tech; 
that can be applied to reduce the nonsampling comp 
of total error. 

Recommendation 6.2. We recommend that the Cens 
Bureau include the testing of sampling in folio' 
part of the 1987 pretest program. We recommend 
in its research the Census Bureau emphasize tesi 
sampling for the later stages of follow-up. 

A great deal can be learned about the nonrespons 
phenomenon from an analysis of past records of the 
of callbacks and the time required to obtain infori 
from various housing units. We have urged that th: 
analysis be applied to the 1985 and 1986 pretests, 
which we believe that increased automation should i 
possible to capture the follow-up history of indivj 
households. Analysis of the 1980 census experienc< 
also be very useful, but the necessary data were nc 
recorded in sufficient detail. 

Recommendation 6.3. We recommend that the Censu 
Bureau keep machine-readable records on the foil 
history of individual households in the upcominc 
pretests and for a sample of areas in the 1990 c 
so that information for detailed analysis of the 
and error structures of conducting census follov 
operations on a sample basis will be available. 



Telephone Follow-up 

We noted with interest the report on the telephc 
follow-up experiment conducted during the 1980 cens 
(Ferrari and Bailey, 1983) . A sample of units in t 
address lists of seven district offices that were n 
multiunit structures and had not sent back question 
by mid-April was selected for telephone follow-up u 
telephone directories organized by address. (In on 



255 

trict off ice , a sample of units in multiunit structures 
was also drawn.) The nonresponding units not in the 
sample were followed up by enumerators according to stan- 
dard census practice. Preliminary results indicated 
several advantages for the telephone technique, namely 
lower costs per completed interview compared with per- 
sonal follow-up , lower item nonresponse rates for many 
i terns , and fewer duplicate questionnaires. Refusal rates 
were similar for both techniques. A disadvantage of tele- 
phone follow-up was that the directories lacked listings 
or had out-of-date listings for many addresses. The 
Census Bureau's 1990 census research program includes 
further tests of telephone follow-up in 1986 (Johnson, 
1984; Bureau of the Census, 1985b) . 

The report of the 1980 experiment, in addition to 
documenting results, describes in some detail opera- 
tional problems that were encountered in administering 
the experiment. For example, a higher than expected rate 
of return of mail questionnaires after the sample selec- 
tion date reduced the actual sample size of the telephone 
follow-up samples. The regular field office staff and 
the experiment staff also had problems working smoothly 
together in some offices. These kinds of problems may 
affect not only telephone follow-up but also sampling for 
follow-up in general. 

Recommendation 6.4. We support the Census Bureau's 
plans for further testing of telephone follow-up 
procedures in 1986. We recommend that the Census 
Bureau review the implications for sample-based 
follow-up operations of the operational difficulties 
that were encountered in the 1980 telephone experiment. 



Sampling for Coverage Improvement 

Along with proposals to follow up a sample of nonrespon- 
dents, proposals have been put forward to conduct specific 
coverage improvement programs on a sample basis. It is 
suggested that using sampling for coverage improvement 
has the potential to reduce costs, speed the completion 
of the census, and reduce nonsampling error and total 
error. With regard to considerations of data quality, 
coverage improvement programs can result in erroneous 
enumerations (overcount) as well as adding missed house- 
holds and persons to the census. If coverage improvement 
programs are carried out on a sample basis by higher- 



256 

quality staff using careful procedures , it is poss 
that quality may be improved although experience 
post-enumeration coverage evaluation surveys would 
appear to support this hypothesis . On the negativ 
there are problems of costs and delays in estimati 
raised by the use of sampling for coverage improve 
programs* 

In 1970 the Census Bureau carried out two cover 
improvement programs , the National Vacancy Check a 
Post-Enumeration Post Office Check, for samples of 
holds. In 1980 there was a deliberate decision to 
ment all procedures on a 100 percent basis and min 
imputation of entire households. There is evidenc 
the 1970 National Vacancy Check , which involved re 
a small sample of units originally classified as v 
and making a careful determination of their status 
Census Day, came close to measuring the actual net 
count of occupied housing units. The 1980 Vacant/ 
Check, while importantly reducing undercount, also 
tributed to overcount (see Chapter 5). 

Miskura et al. propose to consider the benefits 
sampling for coverage improvement and describe fou 
research projects geared toward developing sample-] 
coverage improvement programs for 1990: 

(1) Work on sample design issues, such as devel< 
of a sampling frame, choice of sample unit, 
possible stratification? 

(2) Investigation of selection and data collect 
methodologies; 

(3) Research on estimation from the results of 
coverage improvement sampling operations? a] 

(4) Research directed at translating the findin 
the estimation work into required additions 
census, for example, imputation procedures 
"persons" corresponding to the estimated 
undercount. 

The Census Bureau's 1986 pretest plans initially 
included a proposal to simulate implementing the V< 
Delete Check on a sample basis. Simulation was pr< 
because a sample of vacant units at one pretest si1 
would be too small to support reliable analysis (s< 
Johnson, 1984) . Current plans do not include this 
research (Bureau of the Census, 1985b) . 

In Chapter 5, the panel recommended that the Cei 
Bureau carefully evaluate previously tried and pro] 



257 

coverage improvement procedures to select only the most 
promising for inclusion in the 1990 research and testing 
program and to drop the rest from further consideration. 
For the procedures that are retained in the test plans 9 
the panel recommends that the Census Bureau consider 
whether sampling offers any advantages* In accord with 
prior recommendations in this chapter 9 the panel suggests 
that sampling will be advantageous only for those programs 
that are carried out in the later stages of follow-up and 
where there is the possibility to achieve substantial cost 
savings. 

Reviewing the coverage improvement procedures discussed 
in Chapter 5, sampling is not recommended for any of the 
address checks carried out prior to Census Day* Oftese 
programs are important for developing a complete list of 
housing units, which is an essential tool for obtaining 
complete population coverage. Among the coverage improve- 
ment procedures implemented after Census Day, the Vacant/ 
Delete Check stands out as a procedure thats (1) proved 
effective in reducing undercount in both 1970 and 1980 
and will undoubtedly be used in 1990 and (2) cost a large 
sum of money in 1980 (at least $36 million) and therefore 
offers the potential for cost savings. 

The panel therefore supports research on the use of 
sampling for the Vacant/Delete Check, particularly as the 
experience in 1970 with conducting this operation on a 
sample basis suggests that a carefully controlled sample 
operation affords the opportunity to reduce erroneous 
enumerations (overcount) as well as add overlooked house- 
holds and persons to the census count. The panel urges 
that such research be carried out as soon as feasible. 

Recommendation 6.5. We recommend that the Census 



Bureau consider the use of sampling for those coverage 
improvement programs that are implemented in the final 
stages of census operations and where there is poten- 
tial for significant cost savings* We recommend that 
the Censos Bureau simulate sampling in the Vacant/ 
Delete Check program in an upcoming pretest. 



SAMPLING FOR CONTENT 

Every census since 1940 has obtained responses for some 
content items from samples rather than from 100 percent 
of the population. The use of sampling in 1940 was very 
limited, but by 1980 the majority of population and 



258 

housing items were asked on a sample basis (see B 
the Census, 1978b) . We briefly recapitulate the 
lights of the use of sampling for content collect 
recent censuses* 

The 1940 census obtained most items from ev 
one i a few items were asked of a 5 percent sample 
population. 

The 1950 census extended the use of samplin 
content and featured a fairly complex sample desi 
About two-fifths of the questions were asked on a 
basis o Sample sizes for population items were 20 
and 3.3 percent (one-sixth of the 20 percent samp 
matrix design was used for housing sample items 
one-fifth of households was asked one or two nous 
items in addition to the complete count questions 

In I960, about three-fourths of the populat 
and two-thirds of the housing items were asked on 
sample basis. Sample sizes were 25 percent for p< 
tion items and 25, 20, and 5 percent for housing 
The 1960 census first introduced the concept of "; 
and "long" forms. In the first stage of census e 
tion, every household filled out the short form, 
every fourth residence, the occupants were also a 
complete one of two versions of a long form, each 
containing the 25 percent population and housing 
but either the 20 percent or 5 percent housing qu< 

The use of sampling for content in 1970 was 
similar to that in 1960. There was a short form ; 
to 80 percent of households and two different verj 
the long form. Each version included the 100 per* 
population and housing items and a common set of : 
asked of 20 percent of households, but one versioi 
included as well a set of questions asked of 15 p< 
of households and the other a set asked of 5 pero 
households. 

In 1980, there was only one long form, but < 
ferent fractions of households received the long : 
depending on the population size of their place oj 
dence. In places expected to exceed 2,500 popula 
one in every six households received the long fon 
while, in smaller places, one in every two househ< 
received the long form. The overall sampling rate 
approximately 20 percent. The primary reason for 
from a uniform 20 percent sampling rate to rates c 
percent for small places (about 5 percent of the ] 
tion) and 16.7 percent for all other places was tc 



259 

provide reliable per capita income data for use in 
general revenue sharing allocations for all places. 

The current short- for in/long- form arrangement is the 
result, historically, of trading off, for each possible 
item, the costs of putting it on the short form, on the 
long form, or not including it at all against the benefits 
of acquiring responses on the item from either a sample 
or a complete enumeration* The costs of including items 
in census questionnaires comprise increased respondent 
burden and hence unit and item nonresponse, increased 
time and resources required for processing of the infor- 
mation, and, perhaps above all, increased difficulty of 
census operations. The Census Bureau cannot hope to 
collect every item that users might want. Benefits 
depend on how the census information collected will be 
used. 

The Census Bureau has a long-established process for 
evaluating proposals for content items to include on the 
questionnaire and for determining whether it is acceptable 
to ask them only on the long form or whether they must be 
included on the short form. Generally, the presumption 
is that items should be restricted to the long form, in 
order to reduce burden and processing costs, unless it 
can be demonstrated that the data are required for very 
small geographic areas such as city blocks or small 
places (see the discussion in Chapter 2) . 



Sampling Plans for Content in 1990 

The Census Bureau is currently in the process of obtain- 
ing reactions from data users regarding proposed content 
for 1990. The Census Bureau has also completed a pre- 
liminary assessment of population data needs for the 1990 
census by subject item and level of geographic area detail 
based on a survey of federal, state, and local agencies 
of mandated requirements for census information (Herriot, 
1984). 

Current plans for the 1990 census are to continue the 
use of two different sampling rates for the long form as 
in 1980. The Census Bureau (Johnson, 1984) is also con- 
sidering conducting a follow-on sample survey that would 
collect additional items that are not on either the long 
form or the short form for about 1-2 million households 
(1-2 percent) . The sample for the follow-on survey would 
be drawn from households receiving the census short form 



260 

and would be fielded about two months after the com] 
of nonresponse follow-up in the census. Items bein< 
sidered include noncash income, disability, and edu< 
Follow-on surveys have been conducted in connection 
previous censuses, but usually directed toward spec: 
populations and not fielded until a year or more af 
the census. (In this regard, the Census Bureau is 
sidering for 1990 a special follow-on survey of res 
of mobile homes? see Bureau of the Census, 1985b.) 

The proposed follow-on survey of a nationally re] 
sentative sample of households enlarges the set of c 
with regard to inclusion of items in the census. I 
the possibility of obtaining data for items not curi 
on the long form for which a lower sampling rate is 
acceptable. It also offers the possibility of movii 
some items currently on the long form to the follow 
questionnaire and thereby perhaps increasing unit ai 
item response rates in the census. Finally, greate: 
detail can be obtained for items on the follow-on si 
given the much reduced sample size, compared with wl 
feasible for the long form. However, the need for < 
follow-on survey should be carefully assessed, as s] 
the appropriateness of including particular items, 
should not be assumed that such a survey will provi< 
vehicle for obtaining all the detail that the census 
itself cannot accommodate. 

The panel had available only sketchy information 
the content and purpose of the proposed 1990 follow- 
survey for which a pretest is planned in 1986. The 
is concerned that the Census Bureau has not applied 
same stringent criteria for determining items to in< 
in the follow-on survey as has been the practice wit 
regard to the long form. For example, the panel is 
troubled by the proposal to ask questions on noncasl 
income, given that respondents may react negatively 
that alternative data sources currently exist for 
information on this topic, including the new Survey 
Income and Program Participation and administrative 
records. The panel suggests that the Census Bureau 
articulate explicit criteria for an item's inclusior 
the follow-on survey. From there, decisions to inc! 
items on the follow-on survey (if it is carried out) 
be made using a process that is similar to the one j 
including items on the long form. 

Recommendation 6.6. We recommend that the Census 
Bureau refine and make more explicit its criteria 



261 

inclusion of items in the proposed follow-on survey 
that is being considered for the 1990 census. 



Possible Alternatives 

In considering issues of sampling for content , the panel 
noted a few alternatives to the current short-form/ 
long- form breakdown with or without a follow-on surveys 

(1) Modified status quo* The sampling rates for the 
long form could be something other than 16,7 percent and 
50 percent based on size of place. There might be three 
or more strata with different sampling rates for each. 
Perhaps there would be only a change in the sampling rates 
in the current two strata. The panel is not aware of any 
consideration of such alternatives by the Census Bureau. 

(2) Matrix sampling for the long form. There would 

be several long forms each containing some items in common 
and some different items. The sampling frame would be 
divided into several groups and each group would receive 
a different long form. This procedure, which was followed 
to some extent in the 1950, I960, and 1970 censuses, would 
allow a greater total number of questions on the long 
form. The panel believes that the logistical problems of 
such an approach are formidable. Moreover, user experi- 
ence with two sets of data products in the 1970 census 
one set based on the 15 percent sample and the other on 
the 5 percent sample suggests that it is preferable to 
have one set of data records that permit cross- 
classifications among all items. 

(3) One- form census with follow-on survey The short 
form might be lengthened to include items that were impor- 
tant for small areas, for example, income. The proposed 
follow-on survey could include the remaining long-form 
items. Data from 1980 census returns suggest that a 
longer short form might not reduce initial response rates 
appreciably. Overall, the mail return rate in 1980 for 
short forms was about 1.5 percentage points higher than 
the rate for long forms. In centralized district offices, 
which were responsible for central cities containing 
hard-to-count areas the difference was about 2.5 per- 
centage points (Turner, 1984, 1985). However, the 
proportion of questionnaires requiring follow-up for item 
nonresponse was much higher for long forms than for short 
forms, based on data from an experiment in the 1980 
census using alternative questionnaires (see Pansier et 
al., 1981; Mockovak, 1982a, 1982b, 1983). 



262 

The one-form census is such a substantial dep rtui 

from the current practice that it is probably or y oJ 

interest for decennial censuses in the year 2000 or 

later. Historically, of course, censuses used t coi 
of only one form. The two- form census came abou in 

order to reduce costs and respondent burden, wha e 

retaining the capability of producing small-area detc 

and detailed tabulations for most items. A one- orm 

census with a follow-on survey would be likely e thei 

be more expensive than the current practice , if he 

census form should include a large proportion of gue* 

tions currently on the long form (particularly i conw 
occupation, and industry, which have the highest 

processing costs) , or to result in a severe loss of 

small-area and subgroup detail, if the follow-on sur\ 

included all or most of the current long- form it ms. 

The panel has not tried to put together a com re- 

hensive list of alternatives to the current shor -fo] 

long-form arrangement in terms of content breakd *n c 

sampling rates, nor has it extensively analyzed he 

several alternatives outlined above. The curren cor 

of both the long- and short-form questionnaires s tl 

result of an elaborate process with widespread c< isu] 

tion among potential users, and the panel has no spe< 

modifications to propose. However, particularly Ln \ 

of the Census Bureau's consideration of a follow DFI 

survey, the panel believes it would be worthwhile for 

Census Bureau to explore alternatives such as th se ] 

above. If one or more alternatives look desirabi 5, c 
sideration should be given to pretesting them. 



THE USE OF ADMINISTRATIVE RECORDS AND SAMPLING F( 
IMPROVED ACCURACY OF CONTENT 

Information on the wide range of content items c< 

the census typically comes from individual respoi jes 

questionnaires (although a small proportion of r< spoi 

are obtained in other ways, such as through impul itic 

One of the methods the Census Bureau has frequen^ Ly i 

to evaluate the quality of reporting in the decer dal 

census is to reinterview a sample of census resp< idei 

after Census Day. Matches with other surveys sue i as 

CPS and with administrative records have also be i us 

for content evaluation. To date, virtually all c >nt 

evaluations have been carried out on a postcensa] bat 



263 

(Bureau of the Census, 1978b? Miskura and Thompson, 1983). 
The results have been used to improve questionnaire design 
in subsequent censuses as well as to inform users of 
census data about limitations in the statistics 8 but have 
not been used to alter responses to the census itself. 

Miskura et al. (1984) discuss the possibility of making 
an integral part of the census enumeration the use of 
survey procedures to reinterview samples of households to 
verify their responses and perhaps adjust content items. 
They propose several research projects in this area. 
Most of their discussion , however , concerns reinterview 
operations , such as the Vacant/Delete Cheeky that are 
more properly characterized as coverage improvement 
programs designed to add occupied housing units and 
persons to the count rather than to change responses to 
content items. 

Brown (1984) discusses several kinds of uses of 
administrative records for content collection and 
evaluation: 

(1) Content evaluation. Administrative records are 
frequently used for this purpose. For example, an 
evaluation of reporting of utility expenditures in 1980 
compared census responses with administrative records 
from utility companies. 

(2) Content improvement. Brown discusses the possible 
use of administrative records as a source of values for 
imputation of missing data in the questionnaire. 

(3) Content collection. Brown notes (p. 5) that "the 
use of administrative records as a source of some census 
data may reduce respondent burden and improve the quality 
. . . without incurring enumeration costs." 

(4) Administrative records census (ARC) . Brown 
reviews proposals to replace the census both for the 
count and for content with data developed from adminis- 
trative records, as is currently done in some European 
countries. 

The panel considered the use of administrative records 
for content collection and improvement but did not con- 
sider the issue of an administrative records census. 
(Chapters 5 and 8 review uses of administrative records 
for coverage improvement and coverage evaluation.) The 
panel has made clear its belief in the importance, for 
1990, of maintaining the traditional concept of enumera- 
tion as the heart of census methodology. However, the 
panel believes that administrative records can make 



264 

important contributions to the census , partici .arly i 
the area of improved accuracy of content. 

The Importance of Improving Accuracy of Conter : 

The concern over completeness of population cc -erage 
the census can obscure equally valid concerns ver t* 
accuracy of the content. Analysis of the fund alloc* 
formula for general revenue sharing, for examp e, has 
shown that the per capita income component of he fo] 
is more important than the population componen in d 
mining the distribution of funds among jurisdi tions 
(Robinson and Siegel, 1979? Siegel, 1975). Ye repoi 
of income in the census, as in household surve s, ar< 
known to be subject to large errors (Bureau of :he Ce 
1970a, 1973a, 1975b) . These facts suggest tha some 
sources can be usefully directed to improving 1 ie 
accuracy of content. 

Evaluation research has documented problems Ln tin 
reporting of many other items in the census bes des 
income. The panel believes that serious attenl .on si 
be directed to research that might lead to impr ved 
accuracy of selected census content items. We elie^ 
research program should include design of opera ions 
verify responses as part of the census enumerat on ai 
as a corollary, consider the issue of adjusting sensi 
reports based on the outcome of such verificati n op< 
tions. We also support research into the possil Llitj 
obtaining some data items by methods other than trad: 
tional census responses. The primary alternate 5 soi 
is administrative records. Obviously, not all tems 
or should be included in verification or alter n< :ive 
collection operations. For the content improvei snt 
programs that appear worthwhile, sampling will c 'ten 
necessary to make the process manageable in the :iel< 
to keep costs within reasonable bounds. 

Because programs to adjust census reports bas id 01 
verification or alternative data collection oper tioi 
have rarely been a part of decennial census meth dol< 
it would be prudent for the Census Bureau to set fort 
and follow a step-by-step research and testing p ogr 
Extensive research should be concentrated on a f> w k 
items. 

Recommendation 6.7. We recommend that the Ce: sus 
Bureau conduct research and testing in the ar< i oJ 



265 

improved accuracy of responses to content items in the 
census* We recommend further that the content 
improvement procedures examined not be limited to 
re inter views of samples of respondents, but include 
the use of administrative records. 



Improving the Accuracy of Housing Items 

In considering the issue of content improvement, the 
panel looked most closely at questions on structural 
characteristics of housing units, particularly the item 
on age of the structure or year when the structure was 
built. We recognize that there are many other content 
items, such as income, that should be reviewed to identify 
means of improving their quality. However, time con- 
straints precluded examining other items besides housing 
structure characteristics. The housing items offer the 
important advantage that concerns over possible invasion 
of privacy from using administrative records as a data 
source seem very unlikely to arise in contrast to the use 
of administrative records to obtain, for example, income 
data. 

Age of structure is an important component of one of 
the two fund allocation formulas for the Community 
Development Block Grant Program. The intent of this 
formula is to direct funds to older, declining cities in 
which the housing stock includes a disproportionate share 
built prior to 1940 (Gonzalez, 1980) . Reporting of this 
item in the census has observable problems (Bureau of the 
Census, 1972, 1975a? Katzoff and Smith, 1983). The non- 
response rate is fairly high, as is the index of incon- 
sistency (a measure of the difference between census 
reports and reports obtained in reinterviews for a sample 
of census respondents) . It has been observed that, in 
some cities, the proportion of housing reported as being 
built before 1940 has been increasing rather than 
decreasing. 

It is not surprising that this item should be poorly 
reported. People who rent their living quarters, par- 
ticularly if they recently moved into the unit, would be 
unlikely to have accurate information regarding the age 
of the structure. Even homeowners may be uncertain about 
when their homes were built. It would seem that buildings 
housing several families, such as apartments or condo- 
miniums, will be those for which response errors are 
largest. For these structures, information on age is 



266 

likely to be available from administrative sources su h 
as assessment and tax records. A specific suggestion for 
the use of administrative records for a sample of stri 2- 
tures to obtain better data from the census on age an 
related items is outlined in Appendix 6.3. 

A second set of items on the 1980 census form that 
deserves comment is the set of questions on utility b: .Is. 
The discussion by Tippett and Takei (1983) establishes 
that there is an upward bias on the order of 50 percer : 
in the responses to these items, A bias of this ordei 
strongly calls into question the usefulness of retain! g 
such questions on the census form. Alternative methoc ; 
of collecting such data, in particular from the utilit es, 
should be considered. 

We understand that the Census Bureau is consider inc 
testing questionnaires that would ask owners or manage s 
of apartment buildings the items on the structure, sue 
as year built, number of units, condominium/cooper at iv 
status, heating equipment, fuels used, source of water 
etc. This method is used in the censuses of several 
European countries at present (Redfern, 1983). We bel eve 
that it is worthwhile to explore this approach, but we lo 
not feel it should replace research on the use of admi is- 
trative records. 

For some housing items it may be appropriate to con 
sider obtaining data from administrative records and 
dropping the items from the census. For example, if tl 5 
primary use for age of structure is as input to the coi 
munity development block grant formula, and cross- 
tabulation of this item with other census items is of ] -w 
priority for users, then a cost-effective approach woul I 
be to devote resources to gaining access to and improvj g 
administrative records for the date of construction anc 
to eliminate this item from the census questionnaire. 

There are problems in using administrative records t 
obtain housing structure items. Records are kept in 
different ways and vary in quality and accessibility ir 
different jurisdictions. For example, records such as 
tax assessors 1 rolls are highly computerized in some 
jurisdictions, while maintained on paper in other areas 
The number and types of characteristics recorded for ea i 
property also vary (see Bureau of the Census, 1984a) . 
Nonetheless, investment in research and testing of the 
use of administrative records for housing structure itei s 
offers the potential to improve the accuracy of the dat 
while reducing respondent burden in the census (or, 
alternatively, permitting other useful items to be put < i 



267 

the questionnaire) Similarly, research into the 
feasibility of obtaining utility expenses from utility 
company records would appear very worthwhile* 

Recommendation 6.8. We recommend that the Census 
Bureau investigate the cost and feasibility of alter- 
native ways of obtaining data on housing structure 
items. Possibilities includes (1) obtaining housing 
structure information on a sample basis from adminis- 
trative records and using this information to verify 
and possibly to adjust responses in the census; (2) 
obtaining structure information solely from adminis- 
trative records and dropping these items from the 
census? and (3) asking structure questions of a 
knowledgeable respondent such as the owner or resident 
manager. We recommend that any trial use of a 
"knowledgeable" respondent procedure include a check 
of the data obtained from such respondents against 
data from administrative records. 



APPENDIX 6.1 
COST ESTIMATES FOR A SAMPLE CENSUS 

At various times during the 1970s , the Census B 
prepared cost estimates for conducting a mid-de 
census. These estimates covered several differ 
scenarios , including a large sample survey. Th 
were very rough and a mid-decade census has nev 
conducted f so that there is no experience with 
validate the numbers. Nonetheless, the estimat 
range for the proportionate cost of a large sur 
pared with complete enumeration. 

Figure 6.1 shows several lines plotting cost 
sampling rates developed from the Census Bureau 
for a mid-decade census. These lines indicate 
percent sample survey (the x f s on the chart) wo 
about 70 to 80 percent of the cost of a complete 
and that a 75 percent sample survey (the y's on 
chart) would cost about 85 to 90 percent as muc 

The coefficient of variation for an estimate 
number of blacks for places of different sizes 1 
sampling rates of 50 percent and 75 percent wou 
approximately as shown in Table 6.1. For each j 
place, it is assumed that the black population 
12 percent of the total. The table also assume; 
sampling rate used would not vary by size of pi 



TABLE 6. 1 Coefficient of Variation for Estimates of the Black 
Size of Place and Size of Sample 

Coefficient of Variation for 
Estimate of Black Population (%) 



Place Size 50% Sample 

10,000 5 

5,000 7-8 

2,500 10 

1,000 15 

NOTE: The black population is assumed to represent about 12% of the total f< 
calculation of the coefficient of variation includes a factor of 2 for the design efft 
sampling entire households rather than conducting a simple random sample of 

SOURCE: Calculated from Herriot (1984:Table 1). 



268 



269 



500 r- 



450 ~ 



400 



350 ~ 



300 



V 1 75% rate, $410 million 

50% rate, 
$385 million 



X 1 50% rate, $340 million 



O 

Q 



u. 250 



O 
O 



= 200 



O 
O 




G (100% rate, 
$500 million) 

E (100% rate, 
$475 million) 



75% rate, $445 million 



150 



100 



(2% rate, $60 million) 



A (0.2% rate, $10 million) 



NOTE: All cost estimates were developed 
assuming 1976 dollars and estimated 1985 
workloads (number of housing units). 

SOURCE: Line ABODE: Department of Com- 
merce (1976:1-2, A-lll); Bureau of the Census 
(1976c). Line ABCFG: Point F adjusts Point D 
costs of $275 million by $55 million, repre- 
senting planned coverage and other improve- 
ments for the 1980 census that the Census 
Bureau factored into the Point E estimate but 
not the Point D estimate (see Bureau of the 
Census, 1976b: level 3 worksheet). Point G 
adjusts the Point E estimate of $475 million 
by $25 million, representing workload in- 
crease from 1980 to 1985 that the Census 
Bureau factored in for all estimates except 
that for Point E (see Bureau of the Census, 
1 9 76c: Explanatory Notes). Line ABCHG: 
Point H represents the result of multiplying 
the Point G estimate by an estimate, devel- 
oped by the General Accounting Office in 
1971, of the proportion that the cost for a 
25% sample survey would be of the cost for 
a full census (GAO, 1971:1). 



20 40 60 80 

SAMPLING RATE (PERCENT) 



100 



FIGURE 6.1 Census costs estimated for varying sampling 
rates. 



APPENDIX 6.2 
ILLUSTRATIVE FOLLOW-UP SCENARIO USING SAMPLING 

Census Bureau staff have estimated that unit nonrespo 

follow-up for the 20 percent of households that did n< 

mail back their questionnaires in 1980 cost about $14 

million. 1 Follow-up was conducted in two stages in 

1980. During the first stage, enumerators were instru 

to make up to four callbacks to try to complete an int 

view. Households for which no information was obtains 

during this stage, even from neighbors or landlords as 

last resort, were followed up as part of the second-st 

operation. 

Data are available for a few district offices in 19 
on the number of callbacks required for enumerators to 
obtain an interview from a nonresponding household and 
the costs of completion. These data (Ferrari and Bail 
1983) indicate that about 1.5 calls were required duri 
the first follow-up stage for enumerators to complete 
interview, that each completed interview cost about 
$3.90, and that about 3 percent of cases were not 
resolved during the first follow-up operation. 

Table 6.2 uses the above, admittedly limited, data 
develop a hypothetical distribution of households requ 
ing follow-up by number of calls to obtain a filled-in 
questionnaire and the associated costs. The scenario 
shown assumes a two-stage follow-up operation with up 
four callbacks allowed in the first stage. 

If the first follow-up operation were restricted to 
two calls and the remaining nonrespondents were sample 
at a 25 percent rate in the second stage of follow-up, 
the cost structure would appear as in Table 6.3. Net 
savings might be about $35 million (3146 minus 3111 
million) if the lower bound estimate of the costs of a 
percent sample compared with a complete effort is used 
(from Figure 6.1). If the higher bound estimate is us 
so that each call costs 312 in the second stage of 
follow-up with a 25 percent sample, then the total cos 
shown in Table 6.3 would be 3126 million (366 plus 360 
and net savings might be about $20 million from the us 
of sampling (3146 minus 3126) . If the Vacant/Delete 



1 Personal communication from Peter Bounpane to the 
Panel on Decennial Census Methodology, March 9, 1984. 

270 



271 



TABLE 6.2 Hypothetical Distribution of Follow-up Callbacks and Costs, 
Scenario 1: Up to Four Callbacks Permitted in First Stage of Follow-up 



Housing Units 


Callbacks to 
Complete 


Number of 
Callbacks 


Cost ($) 

($4/call) 




Number 


Percentage 


First 












follow-up 


8.5 


9.7 


1 


8.5 


34 




4 


4.5 


2 


8 


32 




2 


2.3 


3 


6 


24 




1 


1.1 


4 


4 


16 


Subtotal 


15.5 


17.6 




26.5 


106 


Second 












follow-up 


2 


2.3 


5 


10 


40 


Total 


17.5 


19.9 




36.5 


146 



NOTE: Number of housing units, number of callbacks, and cost are in millions; 17.5 million 
housing units is about 20% of the total count of 88 million housing units in 1980. 

SOURCE: See discussion in Appendix 6.2. 



Check were also conducted on a 25 percent sample basis, 
savings for this program might be in the range of s $36 
million x (100-58) = 315 million, to 336 million x 
(100-75) = $9 million. In total, the savings from 
conducting both nonresponse follow-up and the Vacant/ 



TABLE 6.3 Hypothetical Distribution of Follow-up Callbacks and Costs, 
Scenario 2: Two Callbacks Permitted in First Stage of Follow-up, Remaining 
Cases Sampled at 25% in Second Stage of Follow-up 





Housing 
Units 


Callbacks Number 
to Complete of Callbacks 


Cost" 
($) 


First follow-up 


8.50 


1 


8.5 


34.0 




4.00 


2 


8.0 


32.0 


Subtotal 


12.50 




16.5 


66.0 


Second follow-up 


.50 


3 


1.5 


13.5 




.25 


4 


1.0 


9.0 




.50 


5 


2.5 


22.5 


Subtotal 


1.25 




5.0 


45.0 


Total 


13.75 




21.5 


111.0 



NOTE: Number of housing units, number of callbacks, and cost are in millions. 

Costed at $4 per call in the first follow-up stage and $9 per call in the second follow-up stage, 
assuming that a 25% sample costs about 58% of a complete effort (see Appendix 6. 1 note that 
58% is the lower bound estimate; 75% is the upper bound estimate). 



272 

Delete Check with the use of sampling might be in the 
range of 330 to 850 million, or about 3 to 5 percent of 
the total cost of the 1980 census * This scenario makes 
no allowance for additional expenditure on each call that 
might be made to achieve higher quality through sampling. 
The selection of a 25 percent sampling rate is purely 
for illustration. The impact of this rate and others on 
the quality of the population estimates for small areas 
would need to be assessed. We note that the overall rate 
of contact for the total population of an area using a 25 
percent second-stage follow-up sample implemented after 
two calls in the first stage would be about 95 percent 
for an area with "average" unit nonresponse of 20 percent 
while the rate of contact would be under 90 percent for 
an area with a 50 percent nonresponse rate. 



APPENDIX 6.3 

IMPROVING DATA ON HOUSING STRUCTURE ITEMS; 
A SUGGESTED METHOD 

The panel offers the following scheme as a suggestion for 
obtaining more reliable data on age of structure and 
related housing items. The basic concept is to develop a 
sample of structures from the address lists compiled for 
the census and to obtain data from local administrative 
records about the characteristics of the structures in 
the sample. It may prove most feasible to carry out this 
scheme in urban areas in which census address listings 
and identifiers carried on local administrative records 
can most readily be matched. 

Prior to the census, a reasonably complete list of 
housing unit addresses is constructed, units that have 
the same basic address (such as Apt. A and Apt. B at the 
same street number) can initially be considered to be 
part of the same structure. Hence , it is possible to 
draw a sample of basic addresses that is a good proxy for 
a sample of structures. 

The precise design and size of the sample would depend 
on the nature of the costs, among other considerations. 
We outline one possible procedure. Assume that the 
sample of basic addresses or structures is drawn with the 
probability of selection proportional to the estimated 
number of units in the structure. For concreteness , 
assume that single-unit buildings are sampled at a rate 
of 1 in 10, duplexes are sampled at a rate of 2 in 10, 
and so forth, up to structures with 10 or more housing 
units that are sampled with certainty. Administrative 
records data for age of structure and other items would 
then be obtained for the structures in the sample. 

The sample of basic addresses or structures can be 
linked to the sample of housing units in the census as 
follows. Assume that one- fifth of the households are to 
receive the census long form, which asks for age of 
structure and related housing items. Given that the 
sample of basic addresses is specified at the time of the 
mailing of the census forms, all of the long- form house- 
holds could be selected from those addresses. Specifi- 
cally, one scheme would be to send long forms to: all 
single housing unit structures that are in the sample of 
basic addresses, two households in all other selected 
structures with less than 10 units, and one-fifth of the 
households in all structures with 10 or more units. 

273 



274 

Recalling the sampling rates for different sized 
tures, this will achieve a one-fifth long- form Sc 
structures with more than one unit* To achieve c 
fifth long-form sample of single- unit buildings , 
also be necessary to send long forms to single-ur 
structures not in the sample of basic addr esses . 
sampling scheme has the drawback of increasing sa 
variance for the long form due to the clustered c 
However f it has the great advantage that all of t 
long- form sample for people living in structures 
or more housing units is included in the sample c 
addresses* Hence t data collected from administrc 
records for these structures are available to ver 
to use in place of responses to the census. 

Two options are available with respect to the 
on age of structure in the census* It could be c 
the census form or it could be omitted. Assume t 
question is retained on the census form. The sin 
processing method would be to use the value obtai 
administrative records for all individuals residi 
the structures that are in the sample of basic ad 
and to retain the answers of individuals not in t 
sampled structures. It would also be possible to 
regression-type procedures to modify responses of 
viduals in structures that are not in the sample 
the information obtained for the sampled structur 

Now assume the question is not included on the 
form. The values obtained from administrative re 
could simply be appended to the census data recor 
persons in structures that are in the sample of b 
addresses. For persons not in sampled structures 
would be possible to assign values obtained from 
structures located in the same area. This should 
very effective procedure in areas in which large 
of units, such as apartment complexes or suburban 
developments, were constructed at the same point 



7 

Adjustment of Population Counts 



THE NEED FOR ADJUSTMENT 

"Since the first census in 1790 there have been problems 
in finding and accurately counting every person living in 
the United States" (Hogan, 1984a:2). However, two things 
are new in recent decades. First, the Census Bureau has 
developed and published quantitative measures of coverage 
error, measures that show that net undercoverage varies 
substantially by age, sex, race, and Hispanic origin. 
Second, the number and kind of uses to which census data 
are put have multiplied. Thus, concerns about the con- 
sequences of differential coverage error have increased 
as has pressure for the Census Bureau to reduce differ- 
ential coverage error . Both improvements in the actual 
census count and subsequent statistical adjustment of 
that count have been urged. In 1980 the Census Bureau 
undertook a major effort to improve the actual count, 
especially for minority groups, but decided against 
subsequent statistical adjustment. However, many consti- 
tuencies have urged adjustment and some have instituted 
litigation to require it. 

The ultimate goal of the decennial census is that of 
accuracy of the final census numbers. The evaluation 
studies undertaken by the Census Bureau broadly identify 
inaccuracies and provide information that would be desir- 
able to use. Therefore, the panel is led to recommend 
that adjustment procedures be developed with the objec- 
tive of improving the accuracy of the census products. 

Adjustment aims, by supplementing the census counts 
with other information, to produce more accurate popula- 
tion estimates than the raw counts. Adjustment may be 
carried on to characteristics data as well. The quality 
of the adjusted census depends, then, on the accuracy of 

275 



276 

the census counts and the other information used a 
as on the adjustment procedure adopted. 

An adjustment of census figures for a region wi 
low completion rate would produce numbers that, al 
closer to the true values than the unadjusted numt 
best one can tell, would still have a great deal o 
tainty attached to them. The panel thus attaches 
importance to the goal of completeness of the cens 
count and views possible subsequent adjustment of 
count as a complement to not a substitute for co 
efforts to improve census coverage. The tradition 
nature of the census as an operation in which "eac 
stands up and is counted" should be maintained, i 
perception of the importance of being counted were 
deteriorate, participation in what Kruskal rightly 
a national ceremony (Congressional Research Servic 
1984:49) might decline , with serious consequences 
accuracy of the census numbers, adjusted or not. 

The goal of accuracy is often unclear ly specif i 
can have different meanings in different contexts, 
government entitlement programs that use the censu 
bers by comparing them with a cutoff point, the de 
inaccuracy is critical only when a number is close 
to the cutoff point that a funding decision would 
affected. For programs that use the relative chan 
census to census, the accuracy of the estimate of 
is the goal. For programs that determine politics 
representation or distribution of public monies th 
allocation procedures based on census information, 
relative accuracy of the census in different geogr 
areas is crucial. As reviewed in Chapter 2, there 
evidence that differential coverage errors importa; 
affect both political representation and fund allo 
The panel believes that adjustment procedures shou 
focus on minimizing these errors, in recommending 
aspect of accuracy as a primary goal, we recognize 
statistical adjustments that help achieve it may r 
the accuracy of certain other census information, 
example, measures of change from prior censuses, 
recognize the importance of other aspects of accur 
particular, errors in the counts themselves, for m 
research, planning, and program purposes of local 
national users. Minimizing differential coverage 
should reduce rather than increase most of these, 1 
different focus might reduce them more, we invite 
study and discussion of the implications of our fo< 
minimizing differential coverage errors. 



277 

In recent censuses 9 the production of what is referred 
to as the "actual census count" already involves, for a 
minority of the census households , a variety of statis- 
tical procedures! imputation for forms damaged in 
processing, imputations of persons in housing units 
believed to be occupied although no one was ever found at 
home, imputation of missing data in partially filled-in 
census forms, etc* (see Bailar, 1983c? also see Appendix 
3.1) . Although it would not be the first instance of the 
use of statistical modifications in the census, making 
further adjustments to the census of the kind we discuss 
in this chapter would importantly increase the use of 
statistical procedures in census- taking. The panel 
believes that it is not a question of taking a stand "for" 
or "against" adjustment, as adversarial circumstances 
press one to do. The decision-making process relative to 
adjustment requires a dispassionate rounded discussion 
recognizing the full range and complexity of the tech- 
nical issues. 

The adjustment question is in reality a series of 
interrelated questions: If an adjustment is to be made, 
what is to be adjusted the count, or some or all of the 
other census information? By what procedure? At what 
level of geography? With what impact on the accuracy of 
a variety of census numbers? On what time schedule? For 
what uses of census data? So that the data set is inter- 
nally consistent or not? In the remainder of this chap- 
ter, we summarize some of the technical information 
pertinent to these questions and present the recommenda- 
tions to which we are led. Many technical questions 
remain to be answered if adjustment procedures are to be 
developed in time for their use in the 1990 census. On 
the whole, while much effort will be required, the panel 
is optimistic that substantial progress can be made, and 
many feel that this progress could well be sufficient to 
permit adjustment to become a feasible and desirable part 
of the 1990 census process. 

One of the questions raised with respect to the issue 
of adjustment is the extent of adjustment of character- 
istics information. If an adjustment is implemented, the 
panel recommends that it be carried down to the microdata 
level. This would, practically speaking, be expressed as 
a reweighting of individual records and, hence, would 
represent a coverage-based adjustment of characteristics 
information as well as the counts. However, one could at 
the same time adjust characteristics information through 
models using information from content evaluation programs 



278 

(see discussion in Chapter 6) . Models for the misres] >i 
of characteristics information, such as the under repo: : 
ing of income , and models for the characteristics of 
uncounted people could be attempted. One possibility 
would be to develop models at an aggregate level and t e 
carry the adjustment down to the micro level using me1 K 
discussed below. The development, testing, implements 
tion, and evaluation of such models involve complicate 
difficult issues that the panel has not had time to 
adequately discuss. For the purposes of this report, 1 
panel has decided to concentrate on the adjustment of 
population counts, with the adjustment of characterist < 
that such a reweighting of individual records entails. 

Recommendation 7.1. Completeness of the count is a 
important goal, both for ensuring the accuracy of ti 
census and for establishing the credibility of the 
census figures among all users. Adjustment should i 
be viewed as an alternative to obtaining as complet' 
count as possible through cost-effective means. Ne^ 
theless, the ultimate goal is that of the accuracy < 
the published figures. Given the likelihood that tl 
census will continue to produce different rates of 
undercover age for various population groups, and gi\ 
the equity problems caused thereby, we recommend th< 
work proceed on the development of adjustment proce- 
dures and that adjustment be implemented if there is 
reasonable confidence that it will reduce different! 
coverage errors. 

We note that there are several different methods of 
adjustment that have been suggested so far, and we anti 
pate that others will be proposed. It is possible that 
variety of alternatives, including compromise possibili 
ties, will be developed with evidence that each would b 
an improvement over the census count, but with no obvio 
basis for choosing among them. In our view, this situa 
tion should not by itself preclude the Census Bureau fr 
making adjustments and picking one of the alternatives. 



EVALUATING ADJUSTMENT: LOSS FUNCTIONS AND YARDSTICKS 

One would like to evaluate the numbers produced by the 
census (either based on raw counts or based on adjust- 
ments to those counts) by comparing them with the true 
values of those numbers in the population if one had a 



279 

completely accurate census. Since one cannot know those 
true values, one must use methods external to the actual 
census process to obtain an independent estimate of those 
values. (Some of the methods are described elsewhere in 
this report.) Each of the methods entails positing a 
model or assumptions about both the nature of the errors 
in the census (i.e., about the process by which indivi- 
duals may be either not included or double-counted in the 
raw data) and about the nature of the method (and its 
underlying data) that produced the independent estimate 
of the census values. 

Each of two considerations has a place in the evalua- 
tion of census numbers, the error in the number itself 
and the resulting loss to society due to erroneous treat- 
ment of political jurisdictions (or other uses of the 
census number) . By error we mean the difference or the 
relative difference between the number produced by the 
census (either the raw or adjusted count) and the true 
value for that number in the population if we had a 
completely accurate census. By loss we mean a numeric 
measure of the impact of the error in the census number 
both for each political jurisdiction and for the United 
States as a whole. For this discussion we call these 
numeric measures "loss functions." As we are interested 
in net social gain, our prime consideration is the overall 
loss function for the country as a whole, and not the 
separate loss functions that may be adduced for each 
separate political jurisdiction. A jurisdiction's gain 
or loss of funds or political representation due to error 
is understood to be always a nonnegative loss from 
society's point of view. We are not taking the point of 
view of a single jurisdiction, which might be that any 
gain is a good thing, or of a social planner second- 
guessing the political process, which might be that some 
errors benefit society by counterbalancing deficiencies 
in laws and social policies, or even in other data, e.g., 
on income. 

The determination of the appropriate loss function for 
the country as a whole is a difficult task. Moreover, it 
is impossible to determine a single loss function that is 
appropriate for evaluating every effect of an error in 
the census numbers: each use of the census numbers has a 
different effect resulting in different components of 
loss. 

In the most general setting, loss functions should 
reflect the cost to society of data collection, data 
processing, and data dissemination as well as the costs 



280 

of basing decisions on imperfect information. Decisio 3 
on data collection procedures themselves are influence* 
by costs difficult to evaluate. (For example, what is 
the cost to the respondent of one additional question < i 
the census form?) Thus, loss functions influence cens s 
data collection procedures as well as estimation proce- 
dures and hence considerations of loss are involved in 
decision making for census procedures besides those 
involved with adjustment. 

A discussion of loss functions should not be limite 
to issues related to the question of adjustment. The 
need to determine an appropriate loss function under li 3 
most of the decisions that the Census Bureau makes. Tl s 
determination of an appropriate loss function is typic 1 
accomplished, necessarily, without understanding the 
precise costs associated with various decisions. The 
panel feels that loss function considerations enter 
importantly into all aspects of census methodology and 
the panel's recommendations in other chapters have 
implicitly reflected such considerations. We are 
formally discussing loss functions in the context of 
adjustment because of the focus in the public debate o 
how to measure the consequences of introducing adjustm< v 
procedures into the census. 

In what follows we first discuss loss functions fro 
the point of view of the uses of the census numbers, 
after which we consider their relationship to adjustme t 
procedures. 



A User's View of Loss Functions and Adjustment 

Concern about census coverage error arises less becaus 
of net national undercoverage than because of differen i 
undercoverage by geographic location and demographic 
group. Differential undercoverage causes differences n 
political representation and distribution of public mo i< 
from the allocation that would result if a completely 
accurate census could be taken, differences that may w t: 
to thwart the intent of laws governing representation n 
fund distribution and that are often perceived as unfa r 
One of the principal reasons for adjustment of the cou t 
of census data would be, by reducing differential cove 
age errors, to reduce the impact of these errors on 
political representation, fund distribution, and other 
public programs. 



281 

Because the data produced by the decennial census have 
many uses, the benefit of accuracy in the published num- 
bers is difficult to measure. Indeed, the benefit may 
vary from use to use. Congressional apportionment, for 
example, requires only population totals by state, whereas 
the revenue sharing formula uses population and per capita 
income for each incorporated place. Whether adjustment 
of population totals by state resulting in more accurate 
congressional apportionment will also result in more 
accurate distribution of revenue sharing monies may depend 
on how the adjustment is distributed within the state and 
what, if any, adjustments are made to the per capita 
income estimates. How the different loss functions can 
or should be reconciled in order to preserve consistency 
between the uses of census data for different applications 
is an issue for which the panel has in the abstract little 
advice to offer . 

Even for any single given use of census data, the bene- 
fit of adjustment may vary from place to place. Suppose, 
for example, that midwestern central cities were grouped 
into a domain and the census results for each city ad- 
justed by the same formula. Since the precise under- 
counts and characteristics used in the adjustment for 
each city will differ among the cities, after adjustment 
some cities will be closer to the "true" count than 
others. Adjustment might improve accuracy in these 
cities as a group, but not in all cities equally, nor in 
every city. Nor would they benefit equally, and some 
might be adversely affected (lose federal funds or 
representation) . 

It must be accepted that no adjustment procedure can 
be expected to simultaneously reduce the error of all 
census information for every location in the United 
States. Rather, adjustment should be undertaken when 
there is reasonable certainty that appreciable reduction 
in the general differential coverage error will be 
achieved. A relatively trivial reduction would not be 
worthwhile, since adjustment will surely cost time and 
resources to implement, and doubt about whether the 
adjustment did or did not reduce differential coverage 
error would impair public confidence in census figures. 
Furthermore, knowledge of a subsequent adjustment might 
reduce public cooperation, thus lowering the completeness 
of the census count. 

For an effective adjustment procedure to be widely 
accepted, given that not all localities will benefit, it 
is important that there be as widespread understanding 



282 

and agreement as possible within the professional com- 
munity of statisticians that a general reduction in 
differential coverage error is sufficiently desirable D 
accept adverse impacts on some individual localities. 
More important but difficult to obtain is this under- 
standing throughout all levels of government (see 
Keyfitz, 1979). 

In other words , localities need to recognize two 
important points regarding adjustment. First, the 
standard of comparison should not be the raw census 
count. That is, an adjustment that lowers the populat n 
count for an area may have reduced the error in the 
estimate for that area as much as an adjustment that 
raises the count for another area. Second, although 
adjustment may increase error for some localities, the 
country as a whole has benefited if adjustment has red :e 
overall differential error. One further point: althoi h 
each locality will know whether its count was higher o: 
lower after adjustment, we can reasonably require of ai 
adjustment procedure that each locality's error is mor< 
likely to be reduced than increased, and that no locali y 
will have good reason to believe otherwise, even post 
facto. 

The panel believes that it is substantially more imj r 
tant to reduce the general error per person than the gc - 
eral error per place. Hence, the panel does not recomr n< 
the use of loss functions for measuring the total erroi 
that weight each political jurisdiction equally, e.g., 
that determine the proportion of the 39,000 revenue sha - 
ing jurisdictions that gained or lost through adjustmer , 
regardless of the number of people in each jurisdiction 
Rather, the panel believes that the contribution to tot 1 
loss attributable to an area should reflect the size of 
its population. 

Recommendation 7.2. In measuring the total loss 
associated with an adjustment procedure, we recommen 
that the contribution to this loss attributable to a 
geographic region should reflect its population size 
Thus, we recommend against loss functions based sole r 
on the number of political entities losing or gainin 
through adjustment. 

The next section discusses the properties of several 
kinds of loss functions and considers specifically how 
they take into account population size. 



283 
Loss Functions and Adjustment 

The classical yardstick used by sample survey researchers 
to assess the accuracy of a single number, chosen prin- 
cipally for its convenient mathematical proper ties , is 
the square of the deviation between the number and its 
true value. Whatever loss function we use to assess the 
accuracy of a single number, we still must determine a 
rule for amalgamating the losses associated with each 
number into an overall loss function for the entire set 
of numbers produced. The usual tack taken is to sum the 
individual loss functions. 

Using this rule for squared error applied to population 
gives disproportionate weight to large localities. Con- 
sider the following example. Suppose there are two areas, 
one with true population of 10,000 and estimated popula- 
tion of 11,000, and the other with true population of 
5,000 and estimated population of 5,500. The loss for 
the first area is (11,000-10,000) 2 = 1,000,000, the 
loss for the second area is (5,500 - 5, OOO) 2 - 250,000, 
and the total loss is 1,250,000. The larger area with 
twice the population of the smaller area and the same 
percentage error counts for four times as much in the 
overall loss function. 

Using this rule for squared error applied to relative 
or percentage error (that is, the square of the per- 
centage deviation between the number and its true value, 
or squared relative error) , also a very intuitive idea, 
gives disproportionate weight per person to small 
localities. To continue with our example, the squared 
relative error for the larger area is ((11,000-10,000) / 
10,000)2 x 100 = 1 percent? for the smaller area is 
((5,500-5,000) / 5, OOO) 2 x 100 * 1 percent? and the 
total loss is 2 percent. In this case, the larger area 
counts for no more than the smaller area in the overall 
loss function. 

The following argument gives an in-between notion that 
may be about right, although we make no absolutist claim 
for either the argument or the resulting loss functions. 
Tukey (1983) and Fellegi (1980b) have suggested as an 
appropriate alternative loss function that of "relative 
squared error," that is, squared error divided by the 
true value. In our example, the respective losses for 
the two areas would be (11, 000-10, OOO) 2 / 10,000 = 100 
and (5, 500-5, OOO) 2 / 5,000 = 50, with the total loss 
equal to 150. In this case, the larger area with twice 
the population of the smaller area and the same per- 



284 

centage error also counts for twice as much in the ov ra 
loss function. 

Tukey argues for the use of relative squared erroi on 
the grounds of its invar iance properties. That is, r la 
tive squared error has the property that the contribi .io 
of the error for one area to the overall loss functic i 
proportional to its size, assuming that the percenta< * 
error for all subareas is the same. 

Another loss function that has this invar iance pr< >er 
and is more tractable computationally (see Kadane, 1? 4) 
is squared error divided by the estimated value, us ig 
the same example, the respective area losses would b< 
(11, 000-10, 000) 2 / 11,000 91 and (5,500-5,000) 2 / 
5,500 = 45.5, with the total loss equal to 136.5. A ain 
the area with twice the population size contributed lie 
as much to the overall loss function. Both of these Los 
functions, squared error divided by the true value a i b; 
the estimated value, are commonly used in the analys s o 
contingency tables. 1 

The foregoing discussion pertains to the construe Lon 
of a loss function associated with census-produced ni a- 
bers wherein it is the absolute accuracy of each num r 
that is to be assessed. Concern with minimizing a 
differential coverage error indicates the need for a Los* 
function that reflects not the error in each number < D 
ipsp but rather that error in relation to the errors in 
the other numbers within a set of census-produced nu Ders 
For example, we do not so much want to gauge the ace racj 
of population counts for each county in a state as t 
gauge whether the inaccuracies are relatively evenly dis- 
tributed across the counties. The loss functions ne 3ing 
consideration here really measure not the accuracy c the 
numbers but the "differential inaccuracy" of the nuir ers. 
If, for example, each number is 95 percent of its tr e 



*Note that squared relative error (relative to eith< 
the true or estimated value) is different from relai .ve 
squared error and does not have the same invar iance 
property. All are weighted versions of one another 
Thus relative squared error and squared relative en r 
(relative to the true value) are simple squared err* : 
weighted by the reciprocal of the true value and th< 
reciprocal squared, respectively; relative squared :ror 
is squared relative error weighted by the true valu r and 
so on. 



285 

value/ we would like a "differential inaccuracy" loss 
function to indicate this by having a value of zero. 

Tukey (1983) calls such a loss function a "measure of 
misproportionality" and suggests that one can use for 
each number the squared difference between the relative 
error of that number and the relative error of the number 
of the aggregate of which that number is part. Relative 
error is defined as the error divided by the true number 
or, equivalently, as the ratio of the census-produced 
number to the unknown true number minus 1. if the aggre- 
gate is a state total and each component number is a 
county total, then this measure would be the squared 
difference between the relative error of the county 
number and that of the state number or, equivalently, 
between the ratio of the census to the true number for 
the county and the ratio for the state. Suppose, for 
example, that our two areas of population size 10,000 and 
5,000, each of which had a relative error of 10 percent, 
were part of a larger area of population 50,000 that had 
a relative error of 8 percent. Then the measure of 
misproportionality for each of the two component areas 
would be (.10 - .08) 2 = .0004. If the area of 50,000 
population instead had a 10 percent relative error, then 
the loss function for each of the components would be 
zero. 

To aggregate this loss function across counties, say, 
Tukey suggests a weighted sum of the component (i.e. , 
county) loss functions, wherein the weights are the true 
values of each of the components of the sum. In our 
example, the loss for the area of 10,000 population would 
count twice as heavily as the loss for the smaller area 
in the overall loss function. 



Research on Loss Functions and Adjustment 

One characteristic of the loss functions given above is 
that they are general in nature and not specific to census 
data uses, except in distinguishing absolute and differ- 
ential inaccuracy. Some consideration has been given to 
use-specific loss functions, in particular in the work by 
Kadane (1984) on congressional seat allocation and the 
work by Spencer (1980b) on allocations of revenue sharing 
dollars to states. Kadane demonstrates the close rela- 
tionship between loss functions proposed by Tukey (1983) 
and the loss function underlying the method currently 
used in seat allocation. (See Appendix 7.1 for a brief 
overview of loss functions and apportionment.) 



286 

In modeling the revenue-sharing loss function, Spen 
suggests that the components are not merely the units 
receiving revenue sharing dollars but also the (one) 
source of funding. For each component he postulates a 
loss function a constant multiple of the magnitude of 
overallocation (if one exists) or a possibly different 
constant multiple of the under allocation (if any). Th 
overall loss function is an unweighted sum of componen 
loss functions. 

The Kadane paper exhibits a loss function for allo- 
cating congressional seats among states whose minimi- 
zation results in the allocation procedure actually us 
in Congress. In the case of revenue sharing and other 
major uses of census data, the task of ascertaining an 
appropriate loss function is more complex. The loss 
function studied by Spencer was merely a convenient 
construct, a springboard from which he could proceed t 
investigate the central issue, the implications of dat 
inaccuracy for revenue sharing. 

Research on the effect of the choice of loss functi- 
on the effectiveness of adjustment procedures has been 
limited. Spencer (1980a) provides some evidence that 
degree of improvement resulting from adjustment is not 
very sensitive to choice of loss function. Schirm and 
Preston (1984) studied the effect of a very simple syn- 
thetic adjustment (see the discussion of synthetic 
estimation in a subsequent section) , using only two 
demographic groups, on population proportions across 
geographic areas using a number of different loss 
functions: 

(1) The ratio of the sum of absolute errors in the 
proportions before adjustment to the sum of absolute 
errors in the proportions after adjustment. 

(2) The ratio of the sum of squares of errors in tl 
proportions before adjustment to the sum of squares of 
errors in the proportions after adjustment. 

(3) The fraction of total national population that 
resides in states whose proportions after adjustment ar 
closer to their true proportion than are their propor- 
tions before adjustment. 

They found, in a limited simulation study, that the 
decision to adjust is sensitive to selection of the los 
function in that the first two loss functions offer 
consistent recommendations about whether or not to adji 
in only about 60 percent of the simulated cases. 



287 

Seeking the source of the 40 percent inconsistency can 
bring us closer to an understanding of the impact of 
choice of loss function on the adjustment problem, we 
note first that Schirm and Preston were comparing a par- 
ticular kind of adjustment procedure with no adjustment. 
That particular procedure may have been a substantial 
overadjustment on average in the situations they con- 
sidered, enough so to be worse than no adjustment in many 
cases. A milder adjustment might be better in most 
cases , by either loss function, as far as one can tell. 

Second, Schirm and Preston also developed a number of 
theoretical insights on the effect of a synthetic adjust- 
ment, the most important of which was the observation 
that a synthetic adjustment will probably overcorrect the 
proportions in states in which the more heavily under- 
counted group is an unusually large fraction of the 
population and undercorrect the proportions in states in 
which it is an unusually small fraction of the population. 
This implies, of course, that simple synthetic adjustment 
is inappropriate if one can estimate or guess which states 
have these properties, or obtain information relevant or 
related to them. In their simulation study Schirm and 
Preston found that nearly two out of every three applica- 
tions of adjustment procedures will bring the majority of 
the proportions nearer to the truth, and that on average 
54 percent of the population proportions will be closer 
to the true values. Nonetheless, there, is a small, but 
substantial, probability that a possibly large majority 
of the population proportions is taken further from the 
true values by adjustment. 

In Schirm and Preston's paper, the choice of loss func- 
tion does have a strong impact on the choice of adjustment 
procedure. But when the choice of adjustment procedure 
depends importantly on the choice of loss function, such 
dependence suggests that the particular adjustment pro- 
cedures (or underlying models) under consideration have 
weaknesses that can be moderated or overcome, perhaps 
through a combined or compromise procedure. For example, 
one might average the procedures, or identify the regions 
for which each procedure is effective and then use the 
adjustment procedure that is most effective for that 
region. One might envision an iterative process, with 
one stage consisting of identification of the improved or 
compromise procedure and a second stage consisting of the 
determination of the regions in which the procedure may 
be improved yet further. 



i 



288 

Recommendation 7 3 We believe that, in general , tfr 

results of an adjustment are likely to be affected 
more by the quality of coverage evaluation data and 
the models and methodology used than by the choice <: 
loss functions. Given a family of loss functions wi 
relatively similar objectives , it should be possible 
and desirable* to determine an adjustment procedure 
that has good performance for most or all of them. 
recommend that the Census Bureau investigate the 
construction of adjustment procedures that are robus 
to a reasonable range of loss functions. 



Modifying Extreme Adjustments 

There is a concern that the use of imperfect models coi 
result in adjustments for some small areas that were s< 
different from the census counts as to create a presumj 
tion of overadjustment. This could not occur with an 
ideal procedure and is very unlikely to occur for larg< 
areas and demographic groups. However , the models und< 
lying the methods used for carrying down coverage evali 
tion information to small areas are rough and make lit 
provision for a small area's special characteristics. 
Even if a model is effective by any reasonable definit 
of the term, it is possible that some very small areas 
will be adjusted to totals that differ substantially f 
the census counts. So when the adjustment is essentia 
finished, a reasonable procedure might involve checkin 
the final adjusted estimates against the original raw 
counts. One diagnostic check would be to examine all 
cases in which the adjusted count differed from the ce 
by more than a specified percentage. These might be 
labeled extreme adjustments. The notion of protection 
against an extreme adjustment has been discussed in th 
literature and at the Census Bureau. We mention three 
procedures that could offer protection against these 
extreme adjustments. 

It has been suggested by Tukey (1985) that one not 
decrease any area's count as a result of adjustment. 
This suggestion is based on the belief that most areai 
are undercounted, in which case, at least area-by-are* 
no-decrease policy is an improvement. Presumably few< 
areas will be critical of an adjustment as long as th< 
are not reduced in population as a result. It appear 
unlikely that such a constraint will cause undue dama< 
to the primary goal of improving the differential und 



289 

count? however, it does not represent protection against 
extreme upward adjustments. 

Hogan (1984c) has raised the possibility of using 
rough confidence intervals as a buffer against a poor 
adjustment. These confidence intervals could presumably 
be developed along the lines discussed below in the 
section on variance estimation. In his memorandum, Hogan 
adduces a procedure (one of a family of procedures) that 
does not adjust as long as the confidence interval 
includes the original census value. However, if the 
census counts are outside the confidence interval 
surrounding the adjusted counts, the adjustment is 
reduced to the conservative edge of the confidence 
interval. As an example, should the adjustment be 7 
percent plus or minus 3 percent, the area would be 
increased by 4 percent. This method would help protect 
against extreme invalid adjustments. 

Finally, a policy of refusing to adjust any area up or 
down by more than so many percentage points could be 
established. This would share some of the properties of 
the component protection technique of Efron and Morris 
(1972), which was used by Fay and Herriot (1979) in their 
hierarchical Bayes adjustment of postcensal per capita 
income estimates (see Appendix 7.2 on hierarchical 
Bayesian techniques) . 

These procedures would be needed either because the 
model for adjustment was imperfect or because the loss 
function was not appropriate. Since practical consider- 
ations inevitably enter into the choice of both model and 
loss function, such heuristic compromises are a conse- 
quence of our currently limited modeling and computational 
capability. All three methods for protection against 
extreme adjustments reviewed above are untested in the 
case of adjusting population counts, have quite different 
rationales and effects, and are in need of extensive 
research before they can be recommended for use. 



Error Estimation 

Our concern to this point has been with the magnitude of 
the error in the raw census data, i.e., the absolute or 
relative difference between the number produced by the 
census and the true value for that number in the popula- 
tion if one had a complete census. When one superimposes 
adjustment procedures on the raw counting process to 
produce census estimates, an additional consideration 



290 

enters the picture, namely the statistical proper t: 
the estimation procedure* We are not merely concei 
with the error in the estimated census number, but 
with the range of variation of that error that is 
associated with the estimation procedure adopted. 

It is convenient to focus our concerns and disci 
in terms of two statistical properties of the estii 
procedure, its bias and its standard deviation. Th 
determination of these properties presupposes the 
tence of a model of the process generating the orig 
errors in the census data and, furthermore, knowlec 
errors in whatever coverage evaluation information 
used* Thus, the bias and variance of, say, synthet 
estimation will be quite different depending on whe 
the unreported data are distributed over subpopulat 
and regions with probability proportional to size c 
probabilities associated in some other specific waj 
subpopulation or region. 

For every underlying model of the census and co\ 
evaluation error process, estimates of over- and un 
coverage and any adjusted census data derived from 
estimates will have a range of variation associated 
them. A program to estimate the distribution of er 
arising from the evaluation and adjustment procedur 
needed. Although error can probably be measured or 
imperfectly, information about the distribution of 
is important both for users of the data and for the 
Bureau itself, in the same way that sampling varian 
sample surveys are useful even though they omit inf 
tion on response biases, imperfections in the sampl 
frame, etc. 

The possible errors of local area estimates, aft 
adjustment, obviously enter into the analysis neede 
most decisions discussed in this chapter, e.g., mod 
be used, loss functions, etc. Though we do not dis 
error distributions and their impact for each such 
sis, some comments on the types of error that affec 
adjusted data and the kind of information that is r 
for analysts appear to be useful. We shall use ter 
ogy generally applied in sample survey reports, sin 
most users of census data are familiar with those 
concepts. 

The most commonly used models for errors in samp 
surveys are really about variability of estimates, 
start off by decomposing estimation of errors into 
relating to their bias and to their variance. Even 
relatively simple sample surveys these concepts can 



291 

always be defined without some ambiguity , and this cer- 
tainly applies to the complex statistical procedures 
being considered for use in adjusting the census. 
However, the decomposition seems useful as a way of 
thinking about one of the components of errors of the 
estimates, that due to sampling variability* 

Bias can be thought of initially as the national 
overall error in the census derived from the coverage 
evaluation program estimate. Components of bias can also 
be considered, such as breakdowns by race and ethnicity* 
As in most survey situations, point estimates of bias 
terms will probably not be available (otherwise they 
could be used to modify the estimates). However, it may 
be possible to indicate likely bounds on the biases. 

The first element in the variance component consists 
of the sampling variances of the post-enumeration survey 
(or allied evaluation efforts) . Given usual sampling 
methods, these variances can be estimated. Distributing 
the estimated undercoverage among local areas involves 
another set of errors. The estimation error introduced 
into the estimate for a particular local area by the 
method used to carry down the coverage error can be 
considered as having components of both bias and variance. 
It is probably useful to think of these components as 
part of the total mean square error rather than decom- 
posing into the traditional "bias" and "variance" 
terms. If the data used to provide the "improved" 
population count are from a post-enumeration survey or 
some other national sample, it should be possible to 
calculate between-area mean square errors and estimate 
parameters of the distribution of errors. An approach 
similar to the one used for evaluating corrections from 
the 1970 vacancy check may be feasible (see Waksberg, 
1970, 1971). We recognize that the adjustments to be 
used may be more complex than the fairly simple synthetic 
adjustment used in 1970, but an adaptation of the general 
error estimation approach might possibly be applicable. 
Such calculations would be interesting and useful in 
assessing the impact of the adjustment procedures on 
local area estimates. 

We suggest that the Census Bureau explore methods of 
implementing estimates of error distribution. It may be 
possible to carry out simulations with results from the 
1980 Post-Enumeration Program. This would be useful, not 
only as part of a dress rehearsal for 1990 techniques, 
but also to provide important information for use in loss 
functions and other factors needed for decision making. 



292 

There is, however , another important component o 
error in the census, namely model error , that is nc 
all considered by the usual sample survey decompose 
of error s, because traditional survey methods are it 
free. The standard sampling variability and bias c 
culations presume an underlying model of random san 
in the data collection process* Variations in this 
model 9 the model of nonresponse and multiple respor 
the census , will produce an additional error compon 
To provide no estimate of it because one knows litt 
about how to calculate this component , and to publi 
merely the sampling variability error components ( 
appropriately footnoted) may lead the unwary reader 
believe that the published number represents the en 
magnitude of the error. We therefore suggest that 
Census Bureau investigate methods for providing est 
of this error component as well, for inclusion alon 
the census figures and the sampling variability err 
determinations. 

There is an alternative way to view the choice o 
adjustment and the concepts of error and bias. In 
alternative view, one does not first produce an "ad 
ment procedure" and then analyze its error. Rather 
starts with a probability distribution of the true 
lation given the census counts and other informatio 
a decision is necessary (i.e., to estimate the unde 
geographically) , a loss function is chosen (see abo 
and the optimal decision minimizes expected loss (S 
1954) . Thus, the adjustment procedure is implied b 

(a) what the distribution on the true population is 

(b) the consequence of various sizes of mistakes on 
might make, (i.e., the loss function). One impleme 
tion of this alternative view for estimated underco 
given by Ericksen and Kadane (1983, 1985) and Kadan 
(1984). 

Recommendation 7.4. We recommend that the Censu 
Bureau explore methods for providing estimates o 
errors associated with estimates of census over- 
undercover age, with a view to publishing such er 
estimates along with coverage evaluation results 
any adjusted census data that may be issued. 



CONSIDERATIONS OF INTERNAL CONSISTENCY 

Adjustment of census data could create problems of 
internal consistency of macro and microdata from t 



293 

census. Below we define what we mean by consistency in 
this context and offer our recommendation for dealing 
with this problem. 

Suppose a set of parameters (unknown population quan- 
tities to be estimated) satisfies a mathematical relation. 
Let each parameter be separately estimated, perhaps even 
optimally with respect to some specified loss functions. 
Then it does not generally follow that the estimates 
satisfy the same mathematical relation as the correspond- 
ing parameters. For purposes of this discussion we call 
the set of estimates internally consistent, or for short, 
consistent, if they satisfy the same relation as the cor- 
responding parameters. This use of the word "consistent" 
is quite different from the usual usage in statistics, 
namely, that, as the sample size increases to infinity, 
the estimator converges in probability to the correspond- 
ing population parameter. 

The panel believes that there is a valid distinction 
to be made between the use of statistics designed speci- 
fically for a certain purpose and those designed to be 
used by many people to serve a wide variety of purposes. 
In the former case the only criterion that should be 
operative is that the statistics are "best" for the 
prescribed purpose; they may not be consistent with one 
another, and there is no requirement of consistency with 
other statistics or data sets. Typical cases of the 
latter situation are outputs from the general purpose 
survey vehicles of government statistical agencies. Given 
the exceptional range of output and its widespread use , 
internal consistency is an important quality for general 
purpose statistics. 

The issue of internal consistency is important in rela- 
tion to the possibility of adjusting census counts on the 
basis of combining coverage evaluation survey results 
with modeling. Should adjustment be used, two basic 
alternatives would arise: a set of adjusted population 
estimates could coexist with the unadjusted counts 
implicit in the census microdata set; or the microdata 
set could be adjusted. 

In the former case substantial public confusion is 
likely to arise. It is difficult to explain, for example, 
that the population of a county is not necessarily the 
sum of the populations of its component school districts; 
or to explain that the population shares might only add 
up to 97 percent. Similarly, if the microdata set is 
unadjusted but some higher -level aggregates are adjusted 
for the estimated undercount, then any substantive 



294 

tabulation of characteristics at, say, the county 
will necessarily have sums for groups or areas at 
higher level of aggregation different from the put 
adjusted population of the county. Given the ava; 
ity of adjusted counts for higher levels of aggrec 
the panel believes that many users will perform tl 
adjustment of lower-level data, but presumably les 
effectively than the Census Bureau, at greater toi 
and with the result that different sets of numbers 
be in use. 

The panel recognizes that current census methoc 
produces tabulations that are not consistent in e\ 
respect. Specifically, the data products preparec 
the long-form records are not always consistent wi 
corresponding tabulations in data products product 
the short-form records. The Census Bureau uses it 
proportional fitting to promote consistency of the 
form and long-form data (see Appendix 3.2 and furt 
discussion in this chapter), but inconsistencies i 
However, each data product from the census is inte 
consistent, for example, the marginals of a tabulc 
income by race and sex contained in summary tape f 
based on the long-form records, agree with the mai 
from the same product of a tabulation of occupatic 
ancestry by race and sex, and totals for smaller c 
graphic areas agree with those for larger areas, 
consistency in this respect that the panel support 

Consequently, under the assumption that ad juste 
lation estimates for some geographic aggregations 
produced, the panel favors the alternative of carr 
the adjustment down to the level of the microdata, 
basic approaches are available to accomplish this, 
tion and weighting. Imputation consists of "creat 
the required number and kinds of people in each ar 
assigning to them a full range of census charactei 
including the detailed set of geographic codes. W 
attaches to each micro record a weight calculated 
a manner that the sum of the weights of records co 
an area for which an adjusted count is available i 
to that count. Some methods for carrying adjustme 
to the level of microdata are given below in the s 
on procedures for adjustment. 

Weighting and imputation are closely related ar 
presents some nontrivial problems. Problems arise 
the census consists of at least three distinct but 
related microdata files: persons, families, and h 
holds. Therefore, together with the total number 



295 

persons, the numbers of families and households also have 
to be adjusted. Previous evaluation studies show that 
the total number of persons missed has significant pro- 
portions of both: (a) persons missed in partially 
enumerated households and/or families (these cases do not 
affect the number of families or households but do affect 
family and household characteristics) and (b) persons 
missed in completely missed households or families (these , 
of course, affect the number of such units). Moreover, 
the proportion of persons missed either in partially 
enumerated households, or in completely missed households , 
is known to vary from rural to large urban areas and is 
likely to vary by race, age, and other characteristics. 
If adjustment is to be carried out, then it is desirable 
to estimate the most important of these proportions so as 
not to cause possibly serious damage to such basic statis- 
tics as average family size. 

The discussion of methods for carrying down an adjust- 
ment later in this chapter notes other problems posed by 
the various methods in addition to the problem of properly 
representing both person and family characteristics. 
Careful evaluation of these methods is essential to the 
implementation of an appropriate adjustment procedure. 

The discussion above relates to the range of outputs 
provided by what is publicly and officially referred to 
as "the census." For special circumscribed purposes it 
may well be possible and desirable to compute special 
estimates not consistent with "the census." Finally, the 
panel's brief for consistency is not meant to preclude 
release of unadjusted census numbers for research pur- 
poses. For example, records on the public use microdata 
sample files could be flagged in such a way that re- 
searchers could analyze unadjusted data. As discussed in 
the next section, it may also be necessary to release 
unadjusted census figures prior to the availability of 
final adjusted numbers. 

Recommendation 7.5. The panel believes that it is 
important to strive for internal consistency of 
published census figures. Should adjustment appear 
feasible and effective, methods exist for distributing 
adjusted totals for aggregated groups down to subgroup 
values. We recommend that one of these methods be used 
to achieve internal consistency of census figures. 



296 
CONSIDERATIONS OF TIMING 

As currently specified by law, the Census Bureau is 
required to meet two deadlines for specific data prodi 
from the decennial census s a deadline for the submis 
of state population counts within 9 months after Censi 
Day for purposes of reapportionment of the House of R 
sentatives, and a deadline for submission of small-an 
population counts within 12 months after Census Day f 
purposes of redistricting (see Chapter 2) , 

The panel assumes that the above deadlines are lik 
to be in effect for the 1990 decennial census. It is 
also reasonable to suppose that there will be pressur 
in censuses after 1990 for the release of information 
meet deadlines at least this prompt. It is certainly 
conceivable (and considered to be likely by some memb< 
of the panel) that it will not be possible to prepare 
adjusted data in time to satisfy the above time con- 
straints f especially the December 31st deadline for 
reapportionment. Another possibility is that relative 
crudely adjusted state totals could be prepared by 
December 31st, with more carefully adjusted and more 
detailed data available later. 

Given this contingency/ the Census Bureau could fii 
release the unadjusted census figures, to be followed 
the adjusted figures when they become available. The 
first products might be identified as preliminary (or 
interim) , and the later products labeled as final (or 
revised). Then users, public and private, could decii 
whether it was worthwhile for their purposes to wait 1 
the adjusted figures. Included in this scenario is tl 
possibility of the preliminary figures being used onl: 
for the purpose of reapportionment. This assumes tha 
either the adjustment would be ready in time for redii 
tricting, or that the redistricting deadline could be 
postponed enough to accommodate adjustment. In this 
special case, the inconsistency brought about by the 
existence of two sets of books would be limited to th< 
counts by states. There is another possibility, that 
Census Bureau might be asked to release only adjusted 
figures and users would wait until they were availabl 
Then, of course, the deadlines given above would poss: 
have to be extended. 

Each of the above scenarios is troublesome. The 
release of two sets of books would raise the specter ( 
states, litigants, researchers, and others arguing ov 
which set was the "proper" one to consult. It is eas] 



297 

imagine the furor caused by states that would lose seats 
in the House of Representatives on the basis of the change 
from unadjusted to adjusted figures. It is also easy to 
imagine an equal employment opportunity case resting on 
the determination of whether the relevant percentage of 
minorities living in an area was the adjusted or unad- 
justed statistic. This possibility could be avoided if 
adjusted data could be released in time for redistricting, 
thereby releasing only one set of books with microdata. 
The other scenario, i.e,. the postponement of reapportion- 
ment and redistricting until possibly late 1991 (which 
corresponds to the date of completion of an initial ver- 
sion of the 1980 coverage evaluation program) , extends 
the period during which political representation is based 
on population counts a decade or more out of date . Even 
if the postponement is politically acceptable, the mis- 
proportional representation implied by delay needs to be 
weighed against that implied by use for apportionment 
(and possibly redistricting) of figures somewhat less 
accurate than a careful adjustment might produce. 

The existence of revised estimates is common today in 
the statistical system of the U.S. government. Statistics 
of the gross national product, energy production, con- 
sumer prices, and others all experience revisions and 
alterations, some on a regular basis, others as needed. 
Users accept the price of revisions and inconsistency 
(two sets of books) as the necessary cost of accuracy. 
Continued reliance for a decade on less accurate census 
data because adjustments to increase accuracy could not 
be completed by a specific date would, under that 
scenario, deny the country the potential benefits of more 
accurate data for uses other than apportionment. 

The burden of choice among the above scenarios, which 
is essentially political, should not be left to the Census 
Bureau alone. Assuming that adjustment turns out to be 
feasible and desirable, but that adjusted data cannot 
meet legislated deadlines, it would be important to have 
a firm expression from Congress as to which scenario is 
preferable. 

Recommendation 7.6. Census data used for reapportion- 
ment and redistricting are required by law to be pro- 
duced no later than specific dates. It is possible 
that adjustment of the 1990 census will prove feasible 
and effective in all respects, except for the ability 
to meet the required deadlines. This should not 
necessarily preclude subsequent issuance of adjusted 



298 

data for other uses. In this situation, we recomr 
that the Census Bureau seek determination by Cong] 
of whether it desires that adjusted data be used < 
will therefore extend the deadlines, or wishes to 
adhere to current deadlines and will therefore stj 
late the use of unadjusted (or partially adjusted] 
data for reapportionment and redistricting. 



PROCEDURES FOR ADJUSTMENT 
Inputs to Adjustment Procedures 

Adjustment of the census must be based on one or more 
sources of information on the number of persons like] 
have been missed, either nationally or in a given gee 
graphic region. Efforts to obtain good estimates of 
census over- and undercoverage have now been under w 
for four decades, and several methods exist for estin 
coverage error (see Chapter 4). up to now, these est 
mates of coverage error by race, sex, and age were ma 
available to the public in the form of published repc 
but they were not used to adjust the census results. 

Among the reasons for not making such adjustments 
been limitations on the quality of the information us 
to estimate coverage error and delayed availability c 
such information, as well as concerns about the publi 
acceptability of adjusted data and about their legali 
for certain uses. Of these considerations, this pane 
has been concerned primarily with the technical pos- 
sibilities for obtaining improved coverage estimates 
adjustment techniques and using them in a timely manr 
If the technical capability for adjusting the census 
such a way as to increase its accuracy exists, we be] 
public acceptance of adjusted data will follow. (We 
that there have been numerous questions raised about 
quality of unadjusted data in 1980, evidenced in part 
the litigation on this issue, and it is unclear that 
adjustment would appreciably increase the public cone 
Our detailed recommendations for improved evaluation 
census coverage are given in Chapter 8. Here we brie 
summarize the basic approaches to coverage evaluatior 
that are available. 

Until 1980, the Census Bureau felt that the best 
source of information about the completeness of the 
census counts was the demographic method first develc 



299 

by Coale (1955) . This method is designed to provide 
estimates of differential coverage by demographic groups. 
However, reasonably accurate estimates by geographic 
region within the country, using this methodology, have 
so far not been possible. A further limitation, which 
proved quite serious with respect to the 1980 census, is 
the failure of these estimates, as traditionally con- 
structed, to include any estimate of the number of undocu- 
mented aliens. Several million undocumented aliens are 
believed to have been resident in the United States in 
1980 and a substantial, but unknown, fraction of them to 
have been counted in the census. Thus, the census count 
included a large group of people (not readily identifiable 
in the census data) excluded from the demographic esti- 
mate. This definitional difference was of sufficient 
practical importance by 1980 to severely limit the use- 
fulness of demographic analysis for coverage evaluation 
or for adjustment. Estimates of the number of undocu- 
mented aliens counted in the 1980 census by age, race, 
and sex have now been constructed and used to develop 
demographic estimates of the coverage of the legally 
resident population (see Passel and Robinson, 1984; 
Warren and Passel, 1983) . However, only very rough 
estimates exist for the total population of undocumented 
aliens, and, hence, demographic analysis can at this time 
provide only very rough estimates of the coverage of the 
total population. In addition, the lack of subnational 
detail remains a serious limitation on the use of esti- 
mates from demographic analysis in any adjustment pro- 
cedure, especially for reducing differential geographic 
coverage errors. 

A program similar to the 1980 Post-Enumeration Program 
(PEP) provides an alternative approach to estimating 
coverage errors in the 1990 census. In Chapter 8, we 
identify four major areas of PEP methodology in need of 
improvement, some with more, others with less, concrete 
suggestions on how to proceed. Progress with respect to 
most of these problem areas is needed for PEP to provide 
usable estimates for adjustment either to substantially 
reduce the errors in PEP or at least to obtain a 
substantially better understanding of the combined impact 
of the errors on the resulting estimates. 

Other possible approaches to estimating coverage 
error, not used for coverage evaluation in the United 
States except in a testing mode, include the reverse 
record check, which has been used in Canada (see Fellegi 
1976) , systematic observation in a sample of areas, and 



300 

the matching of census results against a sample dr 
from the union of several lists of per sons . We di 
some of these alternatives and give our recommenda 
for further research in Chapter 8. 

Below we discuss possible technical approaches 
use of information generated from coverage evaluat 
programs for purposes of adjusting the decennial c 
data. Since the estimates derived directly from t 
coverage evaluation programs are necessarily restr 
to a limited number of large areas, and since it i 
desirable for many purposes that the adjustment ap 
small areas, adjustment procedures naturally separ 
into two components: (1) the manner in which the 
estimates are derived for the limited number of la 
areas and (2) the manner in which the estimates ar 
carried down from these larger areas to smaller ar 
The objective of consistency argued for above resu 
the need for adjustment down to the level of the p 
or household, i.e., for some type of reweighting o 
imputation* Therefore, our discussion of methods 
carrying the adjustment down to small areas focuse 
carrying down to the level of the person or the 
household. 



Methods for Starting Out 

Combining Estimates from Different Programs 

It appears that for 1990 the Census Bureau plan 
concentrate on two techniques of coverage evaluati 
demographic analysis and some version of a pre- or 
enumeration survey. In future censuses, additiona 
nigues may become central to the coverage evaluati* 
program. Indeed, the panel recommends in Chapter 
the Census Bureau, as part of its 1990 research pr< 
work on developing other techniques of coverage ev 
tion, specifically, reverse record checks and syst< 
observation. Given the substantial indeterminacie 
involved, not surprisingly, different evaluation pi 
yield differing estimates of census errors. This 
even for variants of the same procedure. The Censi 
Bureau has produced a range of coverage error esti 
from the 1980 PEP based on alternate assumptions (< 
about the nonresponse in PEP) and did not reach a 
elusion as to which estimate was best. Thus, ther 
arises the problem of combining information from v 
coverage evaluation programs. 



301 

One method of combining the information from the dif- 
ferent evaluation programs is to use some methods to 
"benchmark" others. This approach is used when the 
totals from one method are considered to be much more 
reliable than the totals from another, even though the 
internal proportions of the latter are useful. For 
example, it is quite likely that the estimate of the 
black national undercount for various age-sex groups 
derived from demographic analysis will be quite accurate 
in 1990, since it will be relatively unaffected by the 
treatment of undocumented aliens. 

A more general approach, encompassing benchmarking, is 
to consider combining the estimates from the various 
programs in some way. There has been very little work to 
date on the problem of determining reasonable weights to 
use in averaging the information from, say, demographic 
analysis and the PEP, or from various PEP estimates based 
on different assumptions to deal with nonresponse and 
record matching problems. More research needs to be 
conducted to identify models that might be useful in 
trading off the strengths and weaknesses of the various 
census coverage measuring instruments so as to form a 
superior estimate. 



Modifying Estimates From Within One Program 

Tukey (1985) has suggested that an effective way of 
tabulating the information from the PEP or a reverse 
record check with a view to subsequent adjustment is 
through cross-classification by homogeneous domains. 
That is, instead of directly producing estimates for 
political entities such as states, which have hetero- 
geneous populations and therefore on average do not 
differ very greatly in coverage error, estimates can be 
produced for combinations of areas that are homogeneous 
on variables believed to be related to the undercount. 
For example, one domain might comprise central cities in 
northeastern industrial states and another nonmetro- 
politan areas in the Southwest. A benefit from the use 
of these domains is that the homogeneity within strata 
should result in lower variances within strata. 

Given estimates at the level of homogeneous domains , 
it may still be possible to improve on these estimates. 
One can think of the PEP information as being composed of 
a systematic component and a random component. If the 
systematic component can be effectively modeled, the 



302 



model can be used to modify the raw PEP est 
undercover age . 

Regression is one technique for distilli 
tematic component from the observed coverag 
the PEP. Variables are examined for their 
explain the differences in PEP from domain 
these variables (or close surrogates) can b 
and how they affect the PEP estimates deteri 
model relating the PEP estimates to these v 
help reduce some of the random fluctuation j 
PEP estimates. The choice of these variabl 
considerations such as the strength of the : 
to the observed PEP undercount estimates an 
and availability of the data. Ericksen and 
Ericksen, 1984) have proposed an adjustment 
uses as variables for each area: percentage 
(black and nonblack Hispanics) , -percentage < 
census (i.e., the method used largely in ru: 
where enumerators called in person to colle< 
forms), and the crime rate. The testing of 
variables for their explanatory power and r< 
already started, should continue so that th< 
models are better under stood , and the pool < 
variables is better developed. 

In considering regression, another benef 
use of domains is obtained. The values of 1 
for domains that enter into the estimation c 
coefficients should have wider ranges than i 
have if one were using states because the d< 
be less alike than the states. This increas 
values is known to reduce the variance of tl 
regression coefficients. 

The hierarchical Bayesian method (see Apj 
advanced by Ericksen and Kadane (1985) in tl 
context, is one technique for assessing the 
which the systematic part of the PEP has bee 
and for assigning weights to the regression 
the estimate derived directly from the PEP i 
Roughly, the observed sampling variances of 
estimates are compared with the estimated vc 
to the regression. Estimates are combined \ 
inversely related to their variances those 
with less variance getting more weight. AS 
all models, the hierarchical Bayesian methoc 
number of assumptions, which should be anal} 
extent feasible to determine the degree to v 
obtain. Freedman and Navidi (in press) gues 



303 

^alidity of the assumptions underlying the application 
;ited above and observe some lack of robustness with 
respect to departures from these assumptions as well as 
che choice of variables. Their work emphasizes the need 
co validate the assumptions underlying any models made 
ase of in adjustment. 

Tukey (1985) has proposed a similar adjustment tech- 
nique that uses the regression estimates alone r uncombined 
with the observed PEP estimates. The relative merits of 
regression uncombined with the direct PEP estimates , the 
observed PEP estimates uncombined with regression esti- 
mates, or the combinations possible through the use of 
hierarchical Bayesian techniques as well as other models 
need to be researched. 



Methods for Carrying Down 

Assuming that usable adjusted estimates have been created 
for domains or states, it still remains to carry the 
adjustment down to lower levels of geographic and demo- 
graphic aggregation. At least four methods have been 
advanced for this purpose: (1) synthetic estimation , (2) 
iterative proportional fitting, (3) imputation, and (4) 
regression. We discuss these techniques in turn below. 



Synthetic Estimation 

The synthetic method, in the context of adjustment, is 
defined as an estimation process that apportions an over- 
count or undercount for an area to subareas on the basis 
of the population sizes of the subareas (see Hill, 1980) . 
This is usually done by maintaining the larger areas' 
under- or overcount percentages for demographic groups in 
the subareas. For example, suppose that there were two 
demographic groups, I and II, and we were calculating a 
synthetic estimate of the undercount for a small area 
AI within the larger area A. Also suppose, for the 
larger area, the census and a coverage evaluation program 
each estimated the population counts illustrated in Table 
7.1. The synthetic method would now assume a 5 percent 
overcount of group I individuals and a 25 percent 
undercount of group II individuals in every subarea of 
area A. Thus, the synthetic estimate for area A! as 
shown in the table would be 20x (95/100) + 9(40/30) 31. 



304 

TABLE 7.1 Simple Example of Synthetic Estimation 





Demographic 
Group I 


Demograph 
Group II 


Area A 






Census count 


100 


30 


Coverage evaluation 


95 


40 


program estimate 






Area A, 






Census count 


20 


9 


Synthetic estimate 


19 


12 



NOTE: This procedure can be carried out for any number of subareas. 



Adjustment via synthetic estimation involve 
ing each individual belonging to a domain and 
group by the ratio of the count determined by 
evaluation program and that determined by the 
Since this procedure reweights individual recc 
results in a consistent data set. To arrive < 
resulting estimate for any subnational area f t 
for the individuals residing within the area < 
(Appendix 7.3 contains a discussion of a prob] 
tered through the accumulation of synthetic ei 



Iterative Proportional Fitting 

Iterative proportional fitting is a genera: 
synthetic estimation to multiway fitting. Co 
a two-way matrix or table, for which new tota! 
for both rows and columns. A synthetic estim 
computed for each row so that the row totals < 
adjusted table agree with the new row totals, 
columns of the adjusted table can also be adj 
synthetic estimation, after which one returns 
the rows, then the columns , etc., with the en 
iteratively continued. Convergence will occu 
practical situations. This procedure general 
multiway contingency table. The resulting es 
table entries will be completely consistent w 
totals for marginals of the multiway table. 

Iterative proportional fitting is currentl 
the Census Bureau to force certain tables prc 
the sampled long-form records to be essential 
with the corresponding short-form data produc 



305 

percent basis 2 (see Appendix 3.2) . It is a potential 
adjustment procedure for using external totals to adjust 
the census counts as well. As an example, one could use 
totals for demographic groups (provided by an improved 
demographic method) and PEP totals for domains (provided 
by an improved PEP method) with iterative proportional 
fitting so that the adjusted counts agreed with the more 
reliable totals. (This assumes that the overall totals 
of the national PEP and demographic estimates have been 
made to agree, perhaps after a combination or reconcili- 
ation of some kind.) 

When the classification variables are related to the 
undercount, and reliable estimates of the marginal totals 
are available, iterative proportional fitting can be 
expected to result in improvements in the estimated counts 
(see Oh and Scheuren, 1978). However, when the estimates 
of the marginal totals are not reliable, or the class i- 
f ication variables are not related to the undercount, 
iterative proportional fitting can have a detrimental 
impact on the resulting estimated counts. Research needs 
to be carried out to determine what the problems are with 
the use of this technique in the adjustment setting. 



Imputation 

Imputation (in a manner somewhat similar to that used 
by the Census Bureau for unit nonresponse adjustment) has 
also been proposed as a method for carrying down an ad- 
justment to lower levels of aggregation. This procedure 
is very closely related to synthetic estimation. Suppose 
that one determined from PEP or another coverage evalua- 
tion program that certain percentages of persons belonging 
to various demographic groups living in a particular 
domain were missed in the census. Then, instead of re- 
weighting the records for the individuals belonging to 
each of these demographic groups, as one would in a syn- 



2 Consistency extends only to row and column totals of 
tables that are used directly in the iterative propor- 
tional fitting algorithm. For tables of data at either 
different levels of aggregation or cross-tabulated with 
variables not involved in the iterative proportional 
fitting, the algorithm will often result in less incon- 
sistency, but not necessarily complete consistency. 



306 

thetic adjustment, the number undercounted could be adde 
by duplicating at random the records of people already 
counted in the census in that domain and demographic 
group- 



Regression 

Another potential method for carrying down adjustment 
to lower levels of aggregation is regression. The use o 
regression methodology to carry adjustments to lower gee 
graphic levels consists of estimating the coefficients o 
the covariates in a regression model at one level of 
aggregation as already described , and then estimating th 
undercount for each subarea by using the same model, wit 
covariate information for that particular subarea. 

The panel believes that synthetic or iterative proper 
tional fitting methods of carrying down are superior to 
simple regression because the regression model used in 
this way is fitted to a more aggregated set of data than 
the set to which it will be applied. Moreover, since th 
covariate information for smaller areas will have more 
extreme values than would have been used in fitting the 
model, there is the potential for extreme adjustments. 3 

However, the above comments do not rule out a regres- 
sion approach that is modified by some type of limitation 
of the adjustment, constraining it to lie not too far aw 
from the original census count (see the previous discus- 
sion of this topic). For example, one can use regressioi 
to construct weights for individual respondents such thai 
weighted sums match designated population totals for a 
number of auxiliary variables. In this way regression 
can be used to derive a consistent data set. The genera! 
ized regression estimator of the total can be written as 
a weighted sum of the area subtotals, wherein the weight! 
are functions of the auxiliary variables and are so con- 
strained that the weighted sum of each of the auxiliary 
variables equals the known total of the auxiliary variab] 



3 It may be confusing to some that we recommend syn- 
thetic estimation but not regression, when regression is 
merely a generalization of the synthetic method. The 
synthetic method is a special type of regression in that 
it has only one covariate, the census count, which is 
presumed to be well-behaved for even fairly small areas. 



307 

used in the construction of the weights. In the census 
adjustment context, the totals of each of the auxiliary 
variables might be the adjusted estimates of the total 
number of individuals in age-race-sex categories at a 
reasonably high level of aggregation, such as state. 

Huang (1978) and Huang and Fuller (1978) have discussed 
the application of the regression technique to survey 
sampling for both discrete and continuous auxiliary 
variables. A computer program that implements the 
regression weight method is available (Huang, 1983). 

Finally, it should be kept in mind in considering the 
use of any of the above methods of carrying down adjust- 
ments to the microdata level that there are unresolved 
problems of allocating "additional persons" resulting 
from an adjustment to partially counted households and/or 
families. These problems, not fully researched, need 
further investigation. 



CENSUS BUREAU PLANS FOR RESEARCH AND TESTING ON ADJUSTMENT 

The purpose of the last two sections is to describe the 
research and testing plans of the Census Bureau with 
respect to adjustment and present the panel's reactions 
to these plans and recommendations for priority research 
areas. Two documents "Pretest of Adjustment Operations" 
(Bureau of the Census, 1984c) and "Requisite Planning and 
Research Relating to a Decision on Census Adjustment for 
1990" (Wolter, 1984) describe the Census Bureau's adjust- 
ment research and testing program, the former document 
relating solely to the 1986 pretest, and the latter 
document giving an overview of the research up until the 
1990 census. Much of what is discussed in these two 
papers relates to coverage evaluation and is covered in 
Chapter 8. Here we concentrate solely on the aspects 
related to adjustment. The line separating these topics 
is not easily drawn, and the reader is referred to 
Chapter 8 for further discussion. 

The paper on "Pretest of Adjustment Operations" out- 
lines a plan for the adjustment of data collected in a 
census pretest. The idea is to carry out an adjustment 
and study its timing, costs, and quality. The quality of 
the adjustment cannot be directly measured, but the Census 
Bureau intends to examine various indirect indicators of 
quality, such as: (1) consistency of independent esti- 
mates, (2) estimates of components of variance of the 
estimated under cover age, and (3) size of nonresponse. 



308 

There is also the hope that this pretest can i 
unanticipated operational problems with carry} 
adjustment. The product of this pretest will 
tabulations and mock-up publications. 

In order to carry out an adjustment, there 
to be a pretest of a coverage evaluation progr 
same paper mentions two pretests 9 one of a pos 
enumeration survey akin to the 1980 census PEI 
and the other a pre-enuineration survey , whose 
to determine the time savings achieved by taki 
independent sample survey before the census, b 
against the possible effects on data quality i 
by sensitization of the population (see Chapte 
discussion of this point). The paper stresses 
importance of conducting both types of surveys 
same pretest site so that they can be compared 
terms. 

Wolter (1984) describes the current attitud 
approach of the Census Bureau toward planning 
research with respect to making a decision on 
The needs identified in this position paper ar 
measure coverage errors in the decennial censu 
sure errors for small areas, (c) set standards 
adjusted data against unadjusted data, (d) sup 
input to these standards, and (e) develop the < 
techniques and methods to implement an adjustm 
it be decided to do so on the basis of (d) . 

Wolter outlines five major steps as necessa 
address the above needs: (1) summarize the 191 
evaluation studies, (2) determine criteria tha 
used to assess when adjusted counts are better 
unadjusted counts, (3) develop techniques for i 
the undercoverage, (4) decide on an adjustment 
ology should the decision be made to adjust, ai 
the final decision on adjustment. Of these fi^ 
steps two, four, and five concern plans on adji 
and we discuss their .details below. Steps one 
are covered in Chapter 8. 

Step two deals with determining what criter: 
functions to use in comparing adjusted data wit 
justed data. While there are few details givei 
respect to testing and research, the basic appi 
by the Census Bureau will be to: (1) examine 1 
vious literature, including that of Fellegi (IS 
(1983), and Spencer (1980a, 1980b) , some of whJ 
discussed above? (2) examine appropriate loss f 
specific to special uses of the census data, e, 



309 

the revenue sharing formula? and, finally (3) look at the 
losses incurred by an inaccurate adjustment compared with 
losses when there is no adjustment. This third substep 
will be accomplished bys (a) assuming one post- 
enumeration survey estimate yields "true" counts and then 
examining the costs of using one of the other estimates 
based on alternative imputation techniques? (b) assuming 
regression yields "true" counts and then evaluating the 
loss incurred by using synthetic estimation; and (c) 
developing a decision theory analysis to balance probable 
gain from adjustment with the cost of the adjustment 
program. This program is related to the discussion on 
loss functions above, and recommendation 7*3 and the 
accompanying text are relevant as a comment on these 
plans. 

Step four is concerned with the choice of methodology 
to use in carrying out an adjustment. The factors under- 
lying this decision are measures of bias and variance for 
various components of the adjustment methodology for each 
of the proposed methodologies. For example, the sampling 
variances and biases given by the post-enumeration survey 
and the potential biases through the use of a synthetic 
adjustment might be included in making such a choice. 

Finally, there is the decision on the degree of an 
adjustment. The three components making up the degree of 
an adjustment are the timing, the geography, and the 
characteristics detail. The first issue is whether evalu- 
ation data will be ready for adjustment in time for 
reapportionment, redistricting, or later. The second 
issue relates to which level the adjustment will be car- 
ried down to, e.g., states, major cities, counties, 
census tracts, enumeration districts, or city blocks. 
Finally, the third issue relates to whether any of the 
content data will be adjusted, or merely the counts. 

Wolter introduces the concept of what is labeled a 
"complete census adjustment," or an adjustment that is in 
time for reapportionment, carried down to the individual 
household level, and carried out for all characteristics 
collected on the census questionnaire. Wolter states 
that a complete census adjustment is the Census Bureau's 
preferred outcome. He also notes that the Census Bureau 
would plan to make the unadjusted data available for 
interested researchers. The panel earlier in this chapter 
expressed its preference for carrying down an adjustment 
to the microdata level in order to promote consistency. 
With regard to timing, the panel indicated that it does 
not want to ignore the benefits of an adjustment if one 



310 

can be developed effectively except for the fact that it 
may not be ready by December 31, 1990. The panel commen s 

further on Wolter s s paper in Chapter 8. 



PRIORITIES FOR RESEARCH AND TESTING ON ADJUSTMENT 

The panel recognizes that there are many issues and 
aspects of adjustment that need and would benefit from 
additional research The panel believes that an intensr 
research program is called for in the area of adjustment 
and believes that researchers both within the Census 
Bureau and in the academic community can make significan 
contributions to furthering the development of feasible 
and effective adjustment methods . It is in the Census 
Bureau's interest to encourage related studies by academ: 
researchers as well as to pursue a vigorous research pro 
gram of its own in the area of adjustment. As mentioned 
earlier in the report, one of the major constraints on 
the Census Bureau's abilities to carry out research is 
the resulting increased demand on its technical staff. 
The opportunity to augment this staff through the use of 
academics and other researchers should therefore be 
investigated, not only for issues in adjustment, but alsc 
to perform research on many of the other research issues 
suggested throughout this report. 

Recommendation 7.7. The panel recognizes that con- 
siderable work is still necessary and likely to lead 
to improved procedures for adjusting census data. We 
therefore support the Census Bureau's stated plans to 
pursue, internally, research and development of adjust 
ment procedures, and we also recommend that the Census 
Bureau vigorously promote and support related 
statistical research in the academic community. 

The panel concurs in the need for the Census Bureau to 
carry out an adjustment, as an exercise, in its pretest 
plans for 1986, including the preparation of adjusted 
tabulations. Only in this way will the Census Bureau 
learn how to develop an operational capability to adjust. 
The identification of any unknown logistical problems 
with adjustment needs to be made as soon as possible. 
Thus, the panel is strongly in support of the plan to 
carry out a pretest of adjustment-related operations in 
1986, even though one cannot completely determine the 
effectiveness of an adjustment in 1990 through any 
pretesting or research done in 1986. 



311 

Recommendation 7.8. The panel supports the Census 
Bureau in its plans for a 1986 pretest of adjustment 
operations 9 including the production of mock tabula- 
tions of adjusted census data* We recommend analysis 
of the resulting adjusted and unadjusted data sets, to 
help identify the strengths and weaknesses of the 
particular methods tried. 

With respect to the theoretical investigation of an 
adjustment methodology, the panel has identified research 
it would like to see pursued that is not specifically 
mentioned in either of the above papers. The appropriate- 
ness of an adjustment procedure can certainly be measured 
to some extent by investigating the agreement between 
estimates based on different models, the errors involved 
in the adjustment process, and nonresponse. Two examples 
of this were mentioned in our interim report (National 
Research Council, 1984:39,40): 

We suggest starting with the national age-race-sex 
undercount estimates derived from demographic 
analysis for 1980 and deriving from them, through 
synthetic and related means, state-level 
estimates. . . . Comparison of the synthetic 
estimates with the "direct" PEP-derived undercount 
estimates for states should then be made to see 
whether the results shed light on the feasibility 
of using synthetic estimates based on national 
demographic estimates of the undercount to produce 
state and substate undercount estimates. 

The United States should be divided into two (or 
three) blockings of about 20-60 relatively 
homogeneous and not necessarily contiguous do- 
mains. . . . Then, using the first blocking, a 
regression model should be estimated, using from 
three to six covariates, which fits the PEP under- 
count estimates derived for the domains. The same 
should also be carried out for the second blocking 
(and perhaps the third) , attempting to use a dif- 
ferent set of covariates. Estimates for substate 
regions would make use of synthetic techniques 
based on the regression estimates for the homo- 
geneous domains. Then the undercount estimates 
for the two (or three) models should be compared 
in a variety of ways. It would be interesting to 
see whether the substate regression estimates 



312 

summed to the state-level PEP estimates . The 
effect of these estimates on red is trie ting or 
reapportionment could also be examined. 

Since publication of our. interim report , the Census 
Bureau undercount research staff has been actively 
pursuing research along these lines. 

There are other aspects of adjustment methods that 
also be researched. One important component of any 
statistical estimation process is the assumptions ths 
underlie it. Thus* the degree to which the assumptic 
hold for all the competing procedures should be inves 
tigated* This work has already begun through the efJ 
of Freedman and Navidi (in press). The robustness of 
various procedures to assumptions should be examined 
well. 

The methods proposed for carrying down the adjust! 
to lower levels of geographic aggregation should alsc 
investigated* In some respects this is even more im\ 
tant than similar investigations for higher levels oJ 
geographic aggregation, since the methods used to cai 
the information down are necessarily rough, being les 
specific to the population of the area being estimate 
These methods are used as much for convenience as foi 
validity. The validity of the models underlying the 
methods used to carry information down should also be 
investigated on the basis of considerations of variar 
nonresponse, and plausibility of and sensitivity to t 
underlying assumptions. 

Finally, the panel believes that the impact of ad; 
data on a variety of users should be examined. This 
is mentioned in the Wolter paper, but the panel belie 
that it should be explicitly given priority. An inve 
gation would include the problems posed by the possik 
existence of two sets of books, the difficulties brou 
about by the need to allocate additional counts to he 
holds and families, and the effect on estimation for 
families, should these additional counts not have imj 
family relationships. 

Recommendation 7.9. We recommend that research or 
adjustment include: (1) investigations of the 
assumptions underlying the procedures, (2) an att< 
to empirically evaluate the more important of the 
assumptions as well as the sensitivity of methods 
violation of assumptions, (3) study of methods use 
for carrying down estimates to lower levels of 
aggregation, and (4) a study of the impact of 
adjustment on uses of census data. 



APPENDIX 7.1 
A QUICK LOOK AT LOSS FUNCTIONS AND APPORTIONMENT 

The method currently used to apportion the House of 
Representatives derives from Hill (1911) . Let a^ be 
the number of seats allocated to state i, and p^ be the 
known population of state i. Suppose h seats are to be 
allocated. The algorithm used proceeds as follows: 

(1) Set ai - 1 for i - 1, ...,50 

(2) Suppose h' seats have been allocated, i.e., 
^ h 1 . Choose a state j for which 
Pj/sqrt(aj (aj+i) ) is a maximum. 

Increase aj by one, and h 1 by one. 
(3) If h' < h, then return to step (2). 
Otherwise stop. 

It has been shown (Huntington, 1921) that this method is 
equivalent to the minimization of 

L = 2 p2 / ai , (7.1) 



given that each a^ is at least one. To see this, 
roughly, consider that the choice at some stage of the 
process has been to give the last representative to state 
A by minimizing criterion (7.1). We now show that this 
choice also satisfies the Hill algorithm. Assume that 
state A has parameters p A and a A , and state B has 
parameters pg and a B , respectively, for their population 
and number of representatives. Since (7.1) has been 
minimized, we must have that: 



for any choice of B 
or, equivalently, 

PA / < a A> - PA / ( a A-D < Pi / < a B +1 > ~ Pi / (*B) or 
p A / sqrt[(a A ) (a A -l)] > p B / sqrt[(a B ) 



Note: Much of the discussion is taken from Balinski and 
Young (1980) and Kadane (1984) . 



313 



314 

and therefore the choice of A to minimize criterion | 
also maximized the Hill criterion* 

The function L, the Huntington criterion, does not 
the form of a loss function in a strict sense. But i 
equivalent to an index of misproportionality, as defi 
above in Chapter 7. For purposes -of re apportionment, 
loss due to any state could well be a function of hov 
relatively under- or over represented the people of th 
state are, probably a function of (Pj/a^ - p+/h) 2 , wh 
p+ is the total population of the United States. 

Amalgamating these losses by forming the weighted 
I a (Pi/ai - p+/h) 2 is reasonable. This can be seer 
the following argument, if state A has k times the r 
of representatives as state B f and if states A and B 
equally under represented by some apportionment of the 
House of Representatives, then the number of seats ne 
change for state A will be k times the number of seat 
state B. Rewriting the sum Z aj[ (Pj/a - p+/h) 2 as t 
sum of three terms by taking the square of the indica 
expression, it is easy to see that this loss functior 
be rewritten, as i p^/a^ - i pj/h. Since the second 
term is a constant, minimizing this index of mispropo 
tionality is equivalent to minimizing the Huntington 
criterion. 



APPENDIX 7.2 
AN INTRODUCTION TO HIERARCHICAL BAYESIAN TECHNIQUES 

An important problem in statistics is how to effectively 
weight information obtained about certain quantities from 
more than one source, with each source's means and vari- 
ances estimable but unknown. This problem arises in the 
decennial census in more than one context. For example, 
how to combine information from the raw data collected in 
the decennial census and information from the coverage 
evaluation programs is the fundamental statistical prob- 
lem faced in the determination of a method for adjustment. 
In addition, the combination of postcensal estimates based 
on sampled long-form responses for small areas with infor- 
mation from more highly aggregated areas that enclose the 
smaller areas, investigated by Fay and Herriot (1979), is 
again a problem of combining information from more than 
one source. 

Hierarchical Bayesian methods (Lindley and Smith, 1972) 
provide a technique for weighting information from differ- 
ent sources. There is a strong similarity between the 
empirical Bayesian model and the components of variance 
model (see, e.g., Kackar and Harville, 1984; Henderson, 
1975) . In this section we describe the basic approach 
used. 

As an example, assume that we have a process that 
generates n values mi, so that each mi is normally 
distributed with mean m, and variance t 2 (denoted mi 
- N(m,t 2 )). Then, assume that the n random variables Xi 
are independently - N(mi,s 2 ), with m, s 2 , and t 2 known. 
Unconditionally, Xi - N(m,t^-s 2 ) . The problem is to 
estimate the value of mi after observing Xi. Bayesian 
methods direct one to determine the posterior distribution 
of mi given Xi, which can be calculated given the prior 
distribution for mi, and the distribution of Xi given 
m A . The Bayesian estimate of mi, under squared-error 
loss, is then the mean of this distribution, (t^Xj+s^m)/ 
(t 2 +s 2 ). It is a weighted combination of Xi and m, with 
weights t 2 /(t 2 +s 2 ) and s 2 /(t 2 +s 2 ). Thus the two sources of 
information for the value of mi are weighted in an appro- 
priate fashion to arrive at a reasonable estimate. How- 
ever, the estimator required m, s 2 , and t 2 to be known. 

One form of empirical Bayesian methodology (see Efron 
and Morris, 1973; Harville, 1976) generalizes this basic 
approach by treating m, s 2 , and t 2 as unknown parameters 
to be estimated from the data, rather than known a priori. 

315 



316 

The term hierarchical derives from the realizat 
that the modeling need not stop at a distribution 
mean and the estimation of the parameters for this 
tributiono It is, for example , possible , instead 
estimating m and t 2 from the data, to place a prio 
them. Furthermore, if the data have a hierarchica 
structure such that independent realizations of m 
are available, they can be used to estimate the pa 
of that distribution. Moreover, more complicated 
structures than components of variance, such as mu 
regression with "random" coefficients, can be hand 
This can go on indefinitely, as long as estimates 
various parameters can be achieved in some way. Tl 
nesting of models allows incomplete knowledge abou 
tities that affect the quantity to be estimated to 
incorporated into the estimation process. 

In the case of adjustment, one possible hierard 
model might be developed based on the following re 
For major central cities and states and remainders 
states with these cities in them, or for homogeneo 
regions, it is reasonable to consider modeling the 
of the census counts to the true counts, i.e., per< 
ages of undercount. One could assume that the rat: 
the Post-Enumeration Program counts to the census < 
denoted p-[, or some appropriate transformation of 1 
p^ , were distributed approximately as N(m^,Vj[). 
The next step might be to hypothesize a model for i 
means m^. One possibility for this model could be 
regression model such as that suggested in Fay and 
Harriot. Then, m^ - N(XjB,A) , where A is unknown. 
The regression model for percentage undercount coa 
include as explanatory variables such variables as 
return rate, percentage undercount in the previous 
census, percentage minority, crime rate, percentage 
conventional enumeration, socioeconomic status, an< 
(see Ericksen, 1984) . Here, XjB is assuming the re 
of the mean of the "true" undercount. The variance 
the parent distribution, v^ , can be derived from tl 
Post-Enumeration Program specifications, such as tl 
sampling rate. However, this alone is not sufficie 
since there are other components to the variance of 
Post-Enumeration Program besides sampling variance , 
variance due to nonresponse. As a third stage, one 
estimate B using classical least squares, or hypotl 
a model for it as well. The only remaining difficu 
that of estimating A. 



317 

If we had hypothesized the v^ to be constant , over 
it it would be relatively easy to estimate the common 
value by computing the sum of squared deviations of p^ 
about XjB, and making some adjustments. This is what 
is suggested in the work by James and Stein (1961) , and 
later in the work by Efron and Morris (1973). However, 
the variance heterogeneity forces one into a more sophis- 
ticated methodology. One relatively easy but unsatisfy- 
ing way around this difficulty is to assume that A, the 
variance of the prior distribution, is not constant over 
i, but instead, is equal to Av^ . Carter and Rolph 
(1974) propose a method of moments estimator for the case 
with constant variance, A, for the prior distribution, 
which is a more satisfying assumption. 



APPENDIX 7.3 

AGGREGATION OF SYNTHETIC ESTIMATES: 
A COUNTERINTUITIVE EXAMPLE 

While synthetic estimation is suggested for adjustmen 
because of its arithmetic and computational s implicit; 
synthetic estimation is not necessarily an improvemenl 
over the census counts. Let us assume the situation 
depicted in Table 7.2. 

An examination of the table shows that the estimate 
of area totals using synthetic estimation are further 
from the truth than the unadjusted census estimates fc 
areas I and III, and no better for area II. That is, 
synthetic values 108 and 92 are not equal to the true 
counts of 100 , which is the case for the unadjusted 
census counts. This is indicative of the fact that t* 
near-optimality of synthetic estimation, discussed by 
Tukey (1983) for subgroups, is not necessarily preser\ 
when synthetic subgroup estimates are accumulated to 
synthetic geographic area estimates. 



TABLE 7.2 An Example: Problems in Aggregating Synthetic Estima 
Subgroup! Subgroup 2 Total 



Area 


Census 


Synthetic 


True 


Census 


Synthetic 


True 


Census 


Synthetic 


I 


10 


9 


5 


90 


99 


95 


100 


108 


II 


50 


45 


45 


50 


55 


55 


100 


100 


HI 


90 


81 


85 


10 


11 


15 


100 


92 


Total 


150 


135 


135 


150 


165 


165 


300 


300 



318 



8 

Measuring the Completeness 
of the 1990 Census 



RECAPITULATION OF MAJOR ISSUES IN COVERAGE EVALUATION 

Chapter 4 described the methods used to evaluate the 
population counts in past censuses and appraised the 
quality of the various evaluation procedures. There is 
no need to repeat the detailed information of Chapter 4 
in discussing the methods planned for 1990, but it is 
useful to summarize the main features of the evaluation 
methods. We shall also repeat key comments on this sub- 
ject from the panel's interim report (National Research 
Council, 1984). 

The methods used by the Census Bureau, or suggested by 
others for use in evaluating coverage of decennial 
censuses, can be grouped into four types: 

(1) Pre- or post-enumeration surveys, such as the 1980 
Post-Enumeration Program (PEP) ; 

(2) Reverse record checks; 

(3) Matching with administrative records, including 
multiple and composite list techniques; and 

(4) Demographic analyses. 

We later suggest a fifth method for coverage evalua- 
tion, which we call systematic observation. Systematic 
observation is a close relative of ethnographic studies, 
or resident observation. 

Starting with the 1950 census, the Census Bureau's 
evaluation of coverage concentrated on the first and 
fourth methods above. A reverse record check study was 
carried out in 1960 but its quality was judged too poor 
for it to be used. (By contrast, this procedure has been 
judged successful in Canada and considerable reliance has 
been placed on it.) Administrative list matching has been 

319 



320 

used for special studies relating to coverage evalua ion, 
but not for the production of overall estimates of n t 
undercount. 

There are known weaknesses to each of these metho s, 
at least in the way they have been carried out in th 
past* Through 1970 , the Census Bureau f s judgment wa 
that demographic analysis provided the best estimate! of 
under coverage, and these estimates were generally us< J in 
discussions of the under count. Subsequent events, PC - 
ticularly the large level of presumed undocumented 
immigration , caused the Census Bureau to anticipate t at 
this would no longer be true in 1980 and to rely on t e 
PEP for coverage evaluation of the 1980 census. 

Demographic analysis relies on estimates of populs ions 
independent of the current census, using such informa ion 
as annual figures on births and deaths, immigration a d 
emigration, and past census data. In earlier uses of the 
method, it was recognized that the net immigration st tis- 
tics were somewhat shaky, but it was felt that a mode ite 
error in this component could be tolerated without an 
important effect on the total estimate. However, by ; >80, 
the uncertainty regarding the number of undocumented 
aliens in the United States changed perceptions of th< 
accuracy of the independent population figures. New 
importance was attached to questions about the genera] 
quality of data on immigration and emigration. For tl 
1980 census, demographic analysis initially showed art 
over cover age of the white population, a result that tt 
Census Bureau staff and most other analysts considered 
unlikely. The PEP and other survey-related procedures 
have the advantage over demographic analysis of provid ig 
subnational data, although cost constraints severely 
limit the number of areas for which separate estimates 
can be produced. (It is probably unrealistic to assum 
that reliable estimates will be available for more tha: 
at most 100 subareas of the United States.) 

The most recent Census Bureau statements indicate tl it 
the Census Bureau intends for PEP-type surveys to be tl \ 
basic evaluation tool in 1990. Demographic analysis wi 1 
be continued, primarily for use in checking the reason- 
ableness of the survey results for aggregate sex-age-re e 
groups. The panel considers the Census Bureau to be 
acting prematurely in making a decision at this time fc 
the evaluation method for 1990, particularly in light c 
the improvements that may be possible in other methods f 
coverage evaluation, as well as in the PEP. These pos- 
sibilities are discussed in a later section of this 



321 

chapter o We first repeat several assessments of evalua- 
tion methods from the panel's interim reports 

1. Each of the various methods currently used in the 
United States and other countries to measure the complete- 
ness of census coverage is subject to serious limitations, 
including biases* in measuring the coverage of various 
population groups. 

2. There is at present no reason to expect a break- 
through in the methodology of coverage evaluation before 
1990. However f some significant improvements are pos- 
sible f expect ed, and important. 

3. There is, at this time, very little information on 
the quality of subnational estimates of coverage derived 
from any of the currently used evaluation programs. 

These assessments are not meant to discourage evalua- 
tion efforts, but to encourage the Census Bureau to con- 
tinue to explore methods of reducing the levels of uncer- 
tainty. One other general point about evaluation should 
be made. Information about the quality of the national 
census count is important in its own right. However, its 
value would be considerably increased if it could be used 
to modify population counts in subnational geographic 
areas. In Chapter 7 we have identified research whose 
successful completion might make it possible to use eval- 
uation results for subnational adjustments. For such 
modifications to be of greatest use, the evaluation re- 
sults should be known soon after the census is completed. 
While the accuracy of the evaluation methodology and 
ability to provide subnational estimates should be given 
the first priority, the ability to produce data quickly 
should also be an important criterion in choosing the 
evaluation methodology for 1990. 



APPRAISAL OF CENSUS BUREAU PRETEST PLANS FOR COVERAGE 
EVALUATION 

We begin by describing the current coverage evaluation 
research and testing program of the Census Bureau and the 
panel's views toward these programs as expressed and 
updated from its interim report. Then follows a descrip- 
tion and assessment of a recent Census Bureau position 
paper, by Kirk Wolter, on plans for coverage evaluation 
and adjustment in the 1990 decennial census. 



322 



Current Program for Testing and Research of 

Evaluation 

1985 Pretest of Post- Enumeration Survey M 

The Census Bureau experienced a number of 
conducting the 1980 Post-Enumeration Program 
planned a pretest in 1985 on post-enumeratic 
methodology (Hogan* 19 8 4as Appendix A) to try 
ways of overcoming some or all of these diff 
The pretest involves selecting a sample of 2 
Tampa , Florida , one of the two 1985 pretest 
blocks will be completely relisted and match 
pretest census records. The matching will t 
computer match between the sample and the ce 
which will enable the Census Bureau to estin 
overcount as well as the undercount. Nonmat 
followed up using many different sources, e. 
director ies, the post office, local welfare 

Problem areas that the Census Bureau iden 
research ares 

(1) Computer matching; 

(2) Balancing the undercount with the ove 

(3) Evaluating the overcount? 

(4) Nonresponse research? 

(5) Alternate questionnaire design; 

(6) Rules on whether the current or the 1 
resident should be enumerated; 1 

(7) The use of the Post-Enumeration Progr 
benchmark other evaluation methods of 

(8) Homogeneous domains and their effect 
sampling; and 

(9) Limited follow-up. 

Originally the Census Bureau hoped to obtain 
in the 1985 pretest about each of the proble: 
above. Because many of them cannot be teste 
dently, the panel was concerned that the pre 
unable to produce meaningful results for spe 



on whether the current or the listed 
should be enumerated in the PES refer to the 
movers and whether new residents or the resi- 
as present on Census Day are counted. 



323 

There was some indication that the Census Bureau had not 
identified methods and criteria for the evaluation of 
some of the components of this test. Furthermore, the 
likely sample size was too small to identify the dif- 
ferences in alternative methods of estimating the net 
undercount, because , in total, the under count would 
probably be substantially less than 5 percent. Therefore, 
in our interim report, we recommended narrowing the scope 
of the 1985 pretest. The panel believed that priorities 
for the post-enumeration survey pretest should be based 
on an error profile of the Post-Enumeration Program in 
1980, and the most promising improvements investigated. 
As a result of the panel's recommendations, the Census 
Bureau decided to focus its Pretest of Post-Enumeration 
Survey Methodology on the areas of computer matching and 
nonresponse. 



Research Study on Hard-to-Count Groups 

In this pretest, which runs simultaneously with the 
Tampa post-enumeration survey, a sample from various 
administrative lists of males ages 18-40 and children 
under age 10 who live on 1985 PES blocks will be drawn in 
order to examine an administrative records matching 
approach to coverage evaluation of hard-to-count groups . 
The people found on these various lists will be matched 
to the 1985 pretest census and post-enumeration survey 
lists to see if they were included in either (no composite 
list will be created) . People who do not match to either 
will be followed-up for verification of address and other 
information. However, no tracing to determine the status 
of cases not living in the sample block at the time of 
the census will be done. The major objective is to deter- 
mine if administrative records matching is feasible as a 
technique for improving coverage of the post-enumeration 
survey. The feasibility of this approach will be measured 
by the number of individuals located who were missed by 
the census and the PES, the political sensitivity raised 
by this operation, and the accessibility of the various 
list sources. The following administrative lists were 
initially under consideration: 

(1) The 1983 Internal Revenue Service Individual 
Master File; 

(2) Unemployment records; 

(3) Immigration and Naturalization Service files; 



324 

(4) Job Training Partnership Act files; 

(5) Draft registration files? 

(6) Driver f s license files? and 

(7) Other lists, e.g., police blotters or rec 

local hospital admissions. 

Since this pretest will not form a composite 
there will be no testing of this important compo 
administrative list-based coverage evaluation pr 
Many of the lists proposed for use (e.g., police 
and unemployment records) have been tried previo 
poor results (see Bureau of the Census, 1976a:2-' 
also pose problems of duplicates. In addition, 
sible nonrepresentativeness of a composite list : 
from these administrative lists will have to be , 
modated if dual or triple-system estimation with 
the census list, or the census and post-enumerat 
survey lists, is contemplated. For these reasons 
panel recommended in the interim report against ] 
ceeding with this pretest until these difficult^ 
resolved. However, the panel is in favor of coni 
non-field-test research of this methodology. Foi 
example, the panel believes that research is nee< 
assessing the relative advantages of various alt 
approaches to estimation of coverage of the tota] 
population using several administrative lists (se 
discussion in Appendix 4.1). 

As a result of the panel's interim report recc 
tions, the Census Bureau focused its attention or 
limited number of administrative lists. Otherwis 
hard-to-count study is proceeding as described ab 



The Forward Trace Study 

The Census Bureau designed the Forward Trace S 
(Hogan, 1984asAppendix C) to test various methods 
tracing people from their 1980 census address to 
current address. The purpose is to determine whi 
tracing method would be most effective for use in 
reverse record check. 

The success of the reverse record check in Can 
suggested the use of a similar procedure in the U 
States. A major difference between the United St 
Canada in the application of this technique is th 
year time span between censuses in the United Sta 
compared with 5 years in Canada. This time diffe 
increases the difficulty in tracing people from t 



325 

previous census to their present residence. The Forward 
Trace Study principally addresses this time effect. 

The Forward Trace Study began in October 1981 when a 
sample was selected from the 1980 census supplemented by 
a sample of missed persons derived from the 1980 PEP. A 
third sample of immigrants was added later. Unfortu- 
nately, problems arose in obtaining the fourth sample of 
intercensal births, due to the sensitivity of records for 
out-of -wedlock births and adoptions. The approximate 
sample sizes for the four subsamples are: 

(1) 1980 census 11,900 

(2) People missed 4,000 

(3) Immigrants 2,700 

(4) Births 2,700 (proposed) 

Three different tracing methods are being investi- 
gated: (1) periodic tracing with periodic personal 
contact, (2) periodic tracing with initial personal 
contact, and (3) tracing only at the end of the period. 
The three different tracing procedures will be compared 
for cost and completeness, especially for hard-to- 
enumerate groups. One concern is that the people for 
whom the first tracing method is used may become sen- 
sitized to the census, and therefore may be enumerated 
with greater or lesser frequency than the general 
population. The extent of this sensitization would have 
to be well estimated in order to reliably estimate the 
degree of under enumeration from such a system. 

The panel feels that the Forward Trace Study is likely 
to yield useful information as to the feasibility of using 
a reverse record check to evaluate the completeness of 
coverage of the 1990 decennial census, and therefore 
should be completed. 



Description and Critique of the Wolter Paper 

In October 1984, Kirk Wolter of the Census Bureau pre- 
sented a position paper that represented both a change 
and a narrowing of focus for the research and testing of 
methods of coverage evaluation and adjustment for the 
1990 census. Adjustment-related issues are discussed in 
Chapter 7. Here we discuss the issues related to cover- 
age evaluation. In his paper, Wolter offered the pos- 
sibility of major modifications to the Post-Enumeration 
Program used in 1980. In addition, he outlined the basis 



326 

for the decision on whether to release adjusted data, .t 
what time adjusted data might be released , and to wha 
level of geographic detail adjusted data might be pre- 
sented. We summarize this paper and the panel's read .on 
to it. 

i. Wolter suggested that the Census Bureau might se 
an independent survey, rather than the Currenl 
Population Survey (CPS) , which was used in 19* , 
as the survey of the population of the United 
States for the Post-Enumeration Program in 195 . 

There are many advantages to the use of an indepenc nt 
survey. Restrictions come with the use of the Current 
Population Survey, including the sampling design, the 
timing of the survey, the type of interviewing and 
follow-up used, the questions asked, etc. A survey 
dedicated to coverage evaluation will give the Census 
Bureau the opportunity to consider many possibilities, 
including: (1) the use of administrative records in 
frame development, (2) the use of a compact area sampl 
design, and (3) the use of more intensive interviewing 
and follow-up techniques to reduce nonresponse. Howev Ci 
these freedoms bring with them certain disadvantages. 
The methodology underlying the Current Population Surv y 
is well-tested. The interviewers are skilled at their 
jobs (it is suggested below that an independent survey 
use, wherever possible, Current Population Survey inte - 
viewers) , and the frame development is well-understood 
Moreover, the Current Population Survey, already budge ed, 
would avoid the possibly substantial costs entailed in 
developing and running a new sample survey of the Unit 3 
States. 

ii. The paper suggests that this independent sampl 
be made up of compact area clusters, unlike th 
sampling design of the Current Population Surv f. 

The advantages of a sample of compact area clusters 
(such as entire city blocks) grow primarily from the 
ability to concentrate the enumeration and matching 
efforts on these small, geographically compact areas. 
Thus, two-way matching between the sample survey and t a 
census records can be contemplated. The inability to 
perform a two-way match was one of the major problems : 
the 1980 PEP program. In addition, small-area estimat 3 
of net undercount could be used in model development a 1 



327 

validation with compact area clusters. An added possibil- 
ity is the use of national and local administrative 
records in the same regions, also for purposes of model 
development and validation. 

There are also disadvantages to this proposal. The 
measurement of undercoverage may not be an ideal appli- 
cation for a highly clustered sample design. If under- 
coverage is extremely homogeneous within clusters , the 
effective sample size achieved by clustering could be 
well below that of the 1980 Post-Enumeration Program, 
even though the same number of individuals were sampled. 

On balance the panel favors proposals (i) and (ii) of 
Wolter's paper, i.e., the use of an independent survey, 
which samples compact area clusters, for use in the 1990 
Post-Enumeration Program, particularly if subsequent 
testing shows the intracluster correlations to have only 
a moderate impact on the effective sample size. 

iii. Wolter strongly puts forward the post- 
enumeration survey as the key element of the 
1990 coverage evaluation program, to the 
exclusion of methods such as administrative 
records, reverse record checks, and systematic 
observation. 

Wolter presents many arguments for the discontinuance 
of research on a coverage evaluation program based on a 
reverse record check. The reasons given are: (1) the 
Census Bureau has little experience in running reverse 
record checks; (2) the program in 1990 would have to be 
experimental, since it would be the first time this method 
was used on this scale; (3) there have been problems 
maintaining current addresses for the sample created; (4) 
unexpected difficulties have arisen in acquiring birth 
records from the states because of the sensitive nature 
of these records; (5) the Forward Trace Study is as yet 
incomplete; and (6) all indications are that a reverse 
record check would be more expensive than a post- 
enumeration survey. 

Throughout this report, one of the major themes has 
been the need for the Census Bureau, in its research and 
testing programs, to focus on priority areas, to the 
exclusion of less promising ideas. There are advantages 
to the narrowing of efforts, and coverage evaluation is 
certainly an area in which some narrowing is needed. 
Only in this way can the Census Bureau develop the 
expertise and assurance needed to implement successful 



328 

coverage evaluation techniques. However, in this in- 
stance r the panel feels that the focusing is premature, 
The panel is of the opinion that the available informa- 
tion comparing the various approaches to coverage evali .- 
tion is inconclusive as to the relative merits of these 
approaches. More information needs to be gathered befc e 
strong directions can be recommended. The panel has 
recommended a substantial amount of winnowing down else 
where in the decennial census research and testing prog am 
to accommodate a liberal approach to research and test! g 
here. 

The objections Wolter presents to further investigat on 
of the use of reverse record checks are not compelling. 
An experimental reverse record check was a part of the 
1960 coverage evaluation program in the United States, 
and Canada's experience cannot be disregarded. Further 
more, experimental programs can and should be used durii 3 
the 1990 census so as to be ready for the census in the 
year 2000. Also, the serious problems associated with 
reverse record checks do not seem to be any more seriou 
than those posed by the use of a post-enumeration surve] . 

As mentioned above, the post-enumeration survey has 
special problems with respect to certain populations. 
Reverse record checks, administrative list methods, and 
systematic observation are real possibilities for measui 
ing undercount for these groups. The panel feels that 
the exclusive reliance on a post-enumeration survey 
methodology for coverage evaluation in 1990 is, at this 
time, premature. 

Recommendation 8.1. We recommend that the Census 
Bureau conduct research and tests of alternative 
coverage evaluation methodologies in addition to the 
post-enumeration survey, specifically reverse record 
checks and systematic observation. 

iv. Wolter emphasizes the necessity for the develop- 
ment of a fast and accurate matching algorithm 
whether or not the 1990 PES is to be used for 
adjustment or coverage evaluation. 

Record matching forms an essential part of most of th 
existing workable coverage evaluation methodologies at 
this point in time, including a PES or a reverse record 
check. The panel is in full agreement with the spirit c 
the above statement emphasizing the importance of the 



329 

development of matching capabilities. Much of the re- 
search ongoing at the Census Bureau to expedite the 
matching process is devoted to the development of 
algorithms for computer matching. The panel applauds 
these efforts. 

Wolter's paper bases a great deal of the adjustment 
decision on the successful development of a fast matching 
algorithm. To quote Wolter (1984 s6, emphasis in 
original) : 

A major assumption underlying both the research 
program and the decisions set forth here is that 
fast and accurate matching techniques will be 
developed. ... It is already clear to us that 
there is no fall-back position if we fail to 
develop an accurate matching methodology. In this 
circumstance, the Census Bureau will not have the 
means of adjusting the 1990 census so as to 
improve those census data in any sense. 

The strength of this statement necessitates some quantifi- 
cation of what a fast and accurate matching algorithm 
actually is capable of doing. Once this quantification 
has been made, if it then appears likely that fast and 
accurate matching will not be possible for 1990, we en- 
courage the Census Bureau to investigate and develop 
possible fallback procedures that could then be con- 
sidered for use. 

Recommendation 8.2. We agree that matching algorithms 
are very important to the success of several adjustment 
methods. We recommend that the Census Bureau inves- 
tigate the development of a fallback position in case 
adequate matching is not available in 1990. 

v. Finally, as a first step in the process toward a 

decision on adjustment, Wolter calls for summariza- 
tion of current evaluation studies from the 1980 
decennial census. 

The Census Bureau has completed a number of studies based 
on the 1980 census that, when summarized, promise to pro- 
vide useful information pertaining to coverage evaluation 
and possible adjustment of future censuses. There are a 
number of other studies as yet uncompleted or unreported 
that would also yield important information on strategies 
for coverage evaluation. For example, the Census/CPS/IRS 



330 

Match Study provides a three-way match that could be used 
to form estimates for certain subgroups of the population 
Estimates using this three-way match might have smaller 
variance and possibly smaller bias than estimates using 
the two-way match performed in the PEP. Other studies , 
e.g., the Demographic Analysis of National PEP Estimates, 
Local Area Estimation Research, and the Exploratory Analy 
sis of PEP Data (Hogan, 1984a) , have direct implications 
for the feasibility of adjustment procedures. 

The panel supports Wolter in urging that the above sum 
marization be prepared and that the Census Bureau allocat 
sufficient staff resources to this task. However, the 
panel is also concerned that important studies from the 
1980 evaluation program may not be completed or fully 
documented. The results have potential implications with 
respect to the effective design of other field tests cur- 
rently being planned. The panel has an overall concern 
that the history of tests completed by the Census Bureau 
has not always been available to help in the design and 
consideration of new tests. 

Recommendation 8.3. We recommend that the Census 
Bureau complete and report analyses of 1980-based 
tests related to coverage evaluation, especially the 
Census/CPS/IRS Match Study. 



THE 1990 DEMOGRAPHIC ANALYSIS PROGRAM: POSSIBLE 
IMPROVEMENTS 

Demographic analysis requires data from sources, inde- 
pendent of the current census, to estimate the number of 
persons in a given age-race-sex category. The cor- 
responding number recorded in the census can be evaluated 
by comparison with the demographic approximation. The 
simplest form of such analysis is illustrated by the 
construction of the estimated number of white females 
ages 20-24 in 1990 from: 

(1) The number of white female births from April 1, 
1966, to April 1, 1970? 

(2) The number of white female immigrants from April 
1, 1966, to April 1, 1990, whose age on arrival 
would place them into the target age group as of 
April 1, 1990; 

(3) The number of deaths prior to April 1, 1990, 
occurring in the United States to all white female 



331 

residents born in the time period April 1, 1966 , 
to April 1, 1970? and 

(4) The number of white female emigrants born during 
the target period. The group includes both 
persons born in the United States and those who 
migrated there. 

The number of births is determined from birth regis- 
tration data adjusted for the estimated proportion of 
underregistrationj the number of deaths is the registered 
number i and the number of legal immigrants is derived 
from Immigration and Naturalization Service statistics* 
The number of emigrants is unknown and is estimated from 
a variety of fragmentary information, mostly from immigra- 
tion data of other countries and cohort analysis of 
consecutive censuses. 

This basic form of calculation is applicable only to 
persons born in 1935 or later, because well-founded 
estimates of completeness of registration of births begin 
in 1935. Other forms of analysis have been used for 
cohorts born before 1935. For persons over age 65, 
Medicare files provide reliable data on the size of that 
population. For persons between 45 and 64 for the 1980 
census, and between 55 and 64 for the 1990 census, more 
complex procedures attempt to estimate the true size of a 
cohort at each census date by pooling information about 
the number of persons recorded in the cohort in several 
censuses, making allowance for the estimated differential 
overall completeness of different censuses, for broadly 
similar but systematically evolving patterns of age- 
misreporting, and for differential undercounts by age. 

As indicated in Chapter 4, the main weaknesses of 
demographic analysis are the following: 

(1) No subnational estimates of undercount are avail- 
able (and it is the geographically differential 
undercount ing that leads to possible inequities in 
apportionment and fund allocation) ; 

(2) No estimate of the undercount for Hispanics can be 
constructed because Hispanic groups, until very 
recently, have not been identified in birth and 
death registrations and are not identified in 
immigration records; 

(3) It is necessary to use relatively crude and 
largely unverifiable methodology in estimating 
emigration; 



332 

(4) There are no sufficiently accurate estimates 
available on the number of illegal immigrants 

(5) There are no available estimates for the reli< 
ity of the various component estimates. 

Points (1) and (2) above, unlike (3) and (4), limit t 
available detail provided by demographic analysis bu1 
not affect the reliability of the resulting estimates 
Notwithstanding (3), (4), and (5) , the method is gem 
thought to have provided better national estimates oi 
undercounts by age, sex, and a limited breakdown of i 
for the censuses of 1950, 1960, and 1970 than did the 
post-enumeration surveys. 

Demographic analysis was -found less useful in eva] 
ating the completeness of the coverage of the 1980 ce 
This is attributed primarily to the large number of i 
recorded immigrants who are thought to have entered t 
United States during the 1970s (see Appendix 8.1). 
Another problem with the application of demographic 
analysis to the 1980 census is that the methodology c 
treating race, particularly Hispanics, was changed, 
change created difficult problems of consistency witt 
other data sources, includ- ing earlier censuses. Th 
unknown number of emigrants continued to be a probleu 
1980. Nevertheless, demographic analysis remained us 
for those groups less affected by these shortcomings, 
particularly blacks. For blacks it is believed that 
demographic analysis provided a reasonable measure of 
undercount by age? however, it failed for whites and 
failed to provide estimates for Hispanics. Thus, it 
longer provided reasonable measures of the differ enti 
undercount by race. 

A useful modification of the procedure seems to be 
apply demographic analysis separately to persons born 
the United States and to the foreign born, provided t 
reliability of reporting of country of birth is high 
enough. One advantage is the potential availability 
good national estimates by age and sex for the native 
white and native black populations, at least up to aq 
(i.e., for persons born in 1935 or later). The only 
estimates of international migration needed for this 
group are allowances for the movements of persons bor 
the United States. A portion of this movement could 
inferred from immigration information from other coun 
tries. Estimates of emigration could also be derived 
a by-product, from a reverse record check, if one is 
carried out in conjunction with the 1990 census, or 
perhaps by a multiplicity-sampling approach incorpora 



333 

in the Current Population Survey. If this modification 
is successful, the resulting demographic estimates could 
presumably be used to check the results of a reverse 
record check or a PES, or could be used as a benchmark 
for those methods as they relate to persons born in the 
United States. 

The Census Bureau should investigate the value of the 
native American approach to modifying demographic analy- 
sis. Of course 9 an analysis of the quality of the infor- 
mation on reported place of birth would be required. The 
value of PES information on place of birth should also be 
investigated. 

Recommendation 8.4. We recommend that the Census 
Bureau conduct research into using demographic analysis 
to develop estimates of coverage for the native-born 
population. The research should consider whether 
these estimates could usefully be combined with other 
estimates of coverage. 



THE REVERSE RECORD CHECK PROGRAM: CONSIDERATIONS FOR 1990 

A reverse record check methodology has been used by 
Statistics Canada since 1961 in its assessment of the 
completeness of the coverage of its censuses. This 
procedure is described in Chapter 4, so we summarize only 
the basic methodology here. 

A reverse record check is an evaluation program in 
which a sample of the population is drawn from a frame 
created prior to the census and traced forward to the 
time of the census. The proportion of the sample that is 
determined through tracing to be residing in the United 
States on Census Day provides an estimate of the total 
population.. Usually, the sample is a combination of 
samples from the following four lists s (1) the previous 
census, (2) births in the intercensal period, (3) immi- 
grants from the intercensal period, and (4) people missed 
in the previous census as determined from the previous 
coverage evaluation program. This technique has not been 
used extensively in the United States. 

Compared to post-enumeration surveys of the kind con- 
ducted by the United States to evaluate its censuses, the 
reverse record check seems to offer several advantages : 



334 

(1) Unlike the w do it again, but better 91 meth< 
does not rely on the assumption that the ; 
enumeration survey might succeed very mucl 
where the census failed. And unlike the 
again , but independently 81 method* it does 
to rely on the unverifiable and unlikely , 
that the events of being missed by the cei 
by the post-enumeration survey are indepei 

(2) The coverage error estimates do not depenc 
major way on matching errors a significai 
of vulnerability of "do it again t but ind< 
dently 88 methods of the type carried out a: 
1980 census and planned for the 1990 censi 

(3) The reliability of 1980 United States cem 
coverage evaluation is significantly affe< 
nonresponse in the post-enumeration survey 
is no nonresponse in the reverse record cl 
se. There is an analogous category of tr; 
failed but here again , the reverse recorc 
has some advantages in that the tracing o: 
number of residual cases can be .(as it is 
Canada) carried out over several months, -< 
opposed to the tight time schedule of the 
work of post-enumeration surveys. 

(4) Imputation in the post-enumeration survey 
be validated. By contrast, imputing for 1 
failed cases can be partially assessed by 
ence to independent control totals. Inde< 
reverse record check provides an estimate 
number of persons who died since the prev: 
census a verifiable number. After matchi 
the census, the method also provides an e* 
of the number of persons enumerated in the 
census another verifiable number. 

(5) The reverse record check provides a direct 
of the number of emigrants since the last 
which can be used to overcome one of the s 
cant data gaps of demographic estimation- 
evaluate the current census and as a bench 
its intercensal population estimation. 

One problem for the reverse record check method i 
lack of records for undocumented aliens, so that 
cannot be represented in the reverse record check 
Another significant disadvantage of the reverse i 
check is the need for the tracing operation. Hov 
with a 5-year gap between censuses, the 5 percent 



335 

failed rate achieved in Canada compares favorably with 
the over 8 percent imputation needed in the 1980 evalua- 
tion program used in the United States. 

The panel believes the Census Bureau's experimental 
initiative called the "Forward Trace Study 81 may provide 
some information as to ways of overcoming the problem 
posed by the 10-year intervals between censuses in the 
United States. As discussed above, the Forward Trace 
Study is testing three modes of tracing a sample of 
individuals counted in the previous census , a sample of 
individuals missed in the previous census 9 and a sample 
of intercensal immigrants. The outcome of this study may 
help determine an effective method for tracing people in 
the United States, As indicated in Recommendation 8.1, 
the panel is concerned that a reverse record check be 
given more attention as a potential coverage evaluation 
methodology in 1990. Assuming that Recommendation 8.1 is 
persuasive and the decision is made to proceed in 1990 
with a reverse record check in either a testing mode or 
as a primary coverage evaluation program, it is then 
necessary to know very soon which of the three versions 
of tracing will be used. If it happens that either of 
the methods for more intensive tracing in the Forward 
Trace Study wins out over tracing at the end of the 
period, the intensive tracing must begin by 1986 in order 
to benefit from the shortened period between contact. 
Therefore the samples need to be drawn by 1986. 

Recommendation 8.5. We recommend that the Census 
Bureau move quickly to complete the Forward Trace 
Study to determine the feasibility of using forward 
trace methods in a reverse record check program for 
1990. If the methodology is effective, a national 
sample for this purpose needs to be initiated by 1986. 



THE 1990 POST-ENUMERATION PROGRAM: POSSIBLE IMPROVEMENTS 

Recent Census Bureau reports indicate that a type of 
post- enumeration survey will be the predominant component 
of the coverage evaluation effort in the 1990 decennial 
census, as it was in 1980. Assuming this and given the 
weaknesses of the 1980 version of this program outlined 
in Chapter 4, what possibilities are there for improve- 
ment in the Post-Enumeration Program for 1990? 

There are two purposes for which a post-enumeration 
survey might be used. The first is to evaluate coverage, 



336 

for example to identify subgroups of the populati 
state and major city, that were disproportionatel 
by the census. The second is to use the results 
poses of adjusting the population counts of state 
cities, and smaller geographic regions. These tw 
poses of coverage evaluation and adjustment overl 
considerable degree. It is this second purpose, 
ment, on which we concentrate. We consider possi 
areas for improvement to the techniques of the 19 
Enumeration Program; however, any improvements to 
Post-Enumeration Program as a potential adjustmen 
are clearly improvements to it as a coverage eval 
program. 

We organize this section as follows. First, w 
provide a description of the general procedure us 
the 1980 Post-Enumeration Program. Then we ident 
features of the Post-Enumeration Program in which 
worthwhile gains appear to be possible. For each 
identified, possible approaches for improvement a: 
discussed. 



The 1980 Post-Enumeration Program 

As a coverage evaluation program, the 1980 Post- 
Enumeration Program was useful in identifying dem< 
subsets of the population, by state and major cit^ 
were disproportionately missed by the census. Fo: 
example, the 1980 Post-Enumeration Program indicai 
that, nationally, blacks and nonblack Hispanics w< 
missed more frequently than whites. In addition, 
provided considerable information about erroneous 
tions, duplications, and incorrectly geocoded addi 
which indicated limitations of the decennial censi 
methodology (see Wolter, 1983; Cowan and Bettin, ] 
Thus, the Census Bureau derived a substantial amoi 
information on the quality of the 1980 decennial c 
data set as well as information about which popul< 
to direct its energies to for coverage improvement 
1990. In this sense, the 1980 Post-Enumeration Pi 
can be seen as a continuation of, and improvement 
methods used for coverage evaluation in the 1950, 
and 1970 decennial censuses. 

Chapter 4 contains a detailed description of tl 
Post-Enumeration Program. However, for convenienc 
repeat the overall strategy here. The basic idea 
recount independently a sample of households, and 
quently match individuals included in the two enun 



337 

to determine those missed by the census but included in 
the recount. An estimation model, often referred to as 
capture-recapture g or dual-system estimation, was then 
applied to supplement the direct coverage estimates by 
adding an estimate of the number of individuals missed by 
the census. The Current Population Survey was the 
enumeration system used to perform this recount in 198 0, 
and in this context was called the P sample. Although 
the sampling frame for the Current Population Survey is 
not independent of the decennial census, it undergoes 
sufficient changes over the intercensal period so that 
the listing of addresses used is fairly distinct from 
that of the decennial census (see Bureau of the Census, 
1978a) . This along with the independence of surveying 
operations in the Current Population Survey and the census 
helps promote the desired independence. The P sample 
included about 185,000 persons each for April and August 
1980. 

It was possible to search the census files for matches 
of individuals enumerated in the Current Population 
Survey. However, the search had to be restricted to a 
limited geographic area. Thus a person counted by the 
census within the "wrong" area (as per Current Population 
Survey definitions and operations) appeared at the 
conclusion of this match as if he or she were missed by 
the census. Due to the sampling design of the Current 
Population Survey, which did not make use of compact area 
clusters, it was essentially impossible to search the 
Current Population Survey files for individuals counted 
in the census. (This would then be a two-way match.) As 
a result, there was no mechanism in the P sample, by 
itself, for checking the validity of census enumerations. 
Invalid or erroneous census enumerations include not only 
improperly geocoded census addresses, but also curbston- 
ing, individuals who should not have been included in the 
census, such as foreign visitors and people who were born 
after Census Day, duplicate enumerations, etc. The need 
to measure the frequency of these problems gave rise to a 
second sample, this time a sample of 100,000 indi- 
viduals from the decennial census itself, called the E 
sample. The latter sample was used partly to "balance 
out" from the P sample the contribution of persons in- 
cluded in the census but at the wrong address and partly 
to estimate the number of persons erroneously included in 
the census in order to derive, with dual-system estima- 
tion, net under- or overcount estimates. We note that 
there may be less need for the E sample in 1990 due to 
the possibility (mentioned above) of the use of a sample 



338 



of geographically compact clusters for the PES, since n 

that case two-way matching may be feasible. 



Improving the 1980 Post-Enumeration Program Methodolog 

We have identified four aspects of the 1980 Post- 
Enumeration Program that might benefit from special at- sn- 
tion, although we do not necessarily have unambiguous 

recommendations to offer in every instance: 

(1) Reduction in the level of survey nonresponse; 

(2) Reduction in the percentage of unresolved match* i; 

(3) Improvement in methods to balance the local 
undercount with the over count? and 

(4) Estimation of the degree of independence betweei 
survey and census. 

Reduction in the Level of Survey Nonresponse 

Like any sample survey, the survey used for the Post 
Enumeration Program will suffer from an imperfect 
sampling frame and interview refusals. As mentioned ir 
Chapter 4, over 4 percent of the Current Population 
Survey interviews in April 1980 were refusals. Even wh n 
an interview is conducted, a lack of detailed informati i 
on address or to a lesser extent age, sex, and race for a 
record can create situations in which the status of a 
match with the census is unclear. The resulting proble 
of a large percentage of unresolved matches is addresse 
in the next section. Here we are concerned with people 
in the sample for the PEP, or who would have been in th 
sample had the sampling frame been complete, for whom n 
information was collected. This is a central issue, sii e 
there is a possibility that the same types of people wh< 
are missed in the census are either missing from the PEi 
sampling frame or refuse to cooperate with the PEP inte: 
viewer. 

It would, obviously, be highly desirable to decrease 
the rate of refusal* In 1980, the Census Bureau used tl 
April and August Current Population Survey samples as tl 
P sample of PEP and utilized essentially the same Currer 
Population Survey procedures as for other months. Then 
fore, one possibility for reducing refusal, assuming th 
the Current Population Survey is again used for the PEP, 
is to employ more intensive follow-up than is usually 



339 



done, perhaps after the end of the regular survey week of 
the CPS. The possibility of making cooperation legally 
required should also be explored. This approach may 
introduce a discontinuity into the time series of employ- 
ment and unemployment estimates , although this risk might 
be reduced by appropriate measures. An alternative, 
currently under consideration by the Census Bureau and 
discussed above in the critique of the paper by Kirk 
Wolter, is the use of a separate survey for coverage 
evaluation. Should a separate survey be used, it might 
still be highly desirable to employ experienced Current 
Population Survey interviewers during the non-CPS weeks. 



Reduction in the Percentage of Unresolved Matches 

In the 1980 Post-Enumeration Program, after completing 
a Current Population Survey interview, Census Bureau staff 
geocoded the address of each sample residence to determine 
the enumeration district in which that residence should 
have been placed in the 1980 decennial census. Then that 
(and only that) enumeration district was searched cleri- 
cally for a name-address-race-sex-age combination that 
matched, according to defined criteria, a record from the 
Current Population Survey. Each Current Population Survey 
interview was categorized in one of three ways: matched 
with the census, not matched with the census, and match 
status unresolved. This last group is the most trouble- 
some, at least if one assumes that errors involving the 
first two categories are well controlled. These cases 
can easily give rise to very significant matching errors, 
and hence errors in the estimated undercount. 

When the April 1980 Current Population Survey was used, 
matching status could not be determined for approximately 
8.5 percent of cases. This was due to a variety of 
causes, especially incomplete responses, response errors 
in either the census or the CPS, refusals to respond to 
the Current Population Survey, and ambiguities related to 
addresses (particularly in rural areas) . Use of the 
August Current Population Survey resulted in over 10 
percent of cases with unresolved matching status, larger 
than the April CPS presumably because of the problems 
introduced by mobility (see Wolter, 1983). In order to 
derive estimates of the number of persons missed, a match 
status had to be imputed to the unresolved cases. Over 
30 percent of the April imputations resulted from an 
incomplete follow-up interview of CPS interviewees who 



340 

were not initially matched with the census. 2 Dependl 
on the method of imputation used (combined with some 
factors) , the Census Bureau generated 12 different se 
of PEP estimates of the undercount for states and maj 
cities* These estimates appeared sensitive to the me 
of imputation (and other factors) used (see the discu 
in Chapter 4) . 

Improvements to the geographic system in 1990 may 
helpful in reducing the number of unresolved matches. 
Census Bureau's TIGER (Topologically Integrated Geogr 
Encoding Register) system, currently under developmen 
could very well represent a substantial improvement o 
previous geographic systems , and could be in place by 
1990. However, any resulting benefits from this new 
tern would be dependent, to a large extent, on the qua! 
of the responses that are to be coded. There are als 
efforts by the Census Bureau to avoid the necessity o 
geocoding, by treating the address as an alphanumeric 
response, which can then be used to block the census < 
set for matching in ways that do not require knowledg 
the precise enumeration district of the individual. 

Fractional matching is an idea that could be explo 
as an alternative method of imputing match status to 
cases for which the match status was unresolved. Ass 
that the likelihoods outlined in Appendix 4.2 from th< 
Fellegi-Sunter mathematical model for matching are st 
and available for the cases that are left unresolved, 
along with the several likely matches for these unresi 
cases. It is conceivable that a function of these lil 
lihoods could be developed empirically that would impi 
to each unmatched post-enumeration survey record (by 
computer algorithm and suitable personal follow-up) a 
fractional match status in such a manner that the sum 
these fractions is equal to the unknown number of mat< 
cases. Fractional matching is therefore merely a mode 
relating match status to likelihoods from some model, 
e.g., the Fellegi-Sunter model. Assuming a computer 
matching success rate of 60-70 percent perhaps an 
optimistic rate this, without clerical assistance, we 
result in a massive imputation of match status. Givei 
the substantial impact on undercover age estimates of 
imputing match status to only about 8 percent of casei 
such a major increase in the reliance on imputation 



2 From conversations with Robert Fay, III, 



341 

cannot be recommended on the basis of our current state 
of knowledge . However , there is the possibility of using 
fractional matching solely for those cases for which the 
match status is either very likely or very unlikely, leav- 
ing the remainder for clerical follow-up. Finally , its 
use to impute match status to the residual number after 
clerical follow-up of unresolved cases should be explored. 

Another suggestion that has been made is for the Census 
Bureau to subsample the unresolved cases in order to con- 
centrate efforts on them. There are two possible applica- 
tions of this idea. The first is to sample from all cases 
unmatched by the computer algorithm* Not all members of 
the panel favor this idea. The use of sampling of all 
the matches left unresolved by the computer algorithm 
would result in estimates of undercoverage with substan- 
tially increased variances for important subpopulations 
and subnational regions. The second notion is to sample 
after a first-stage personal follow-up of unmatched cases 
has been attempted. The advantages of this approach 
parallel those discussed in Chapter 6 on sampling for 
follow-up. The full panel endorses this idea. 

We point out that the use of especially intensive 
interviewing , discussed in the previous sect ion 9 should 
improve the reporting of identifying information and 
hence might reduce the problem of individuals with 
unresolved match status. Finally, the use of computer 
matching might permit an extension of the area of search 
with the census file for each PES sample case, as well as 
the use of more matching variables and more advanced 
methodology. These improvements may well result in a 
significant reduction of the nonmatch rate. 



Improvement in Methods to Balance Local Under count 
With Over count 3 

In the 1980 version of the Post-Enumeration Program, 
the E sample was used to estimate the genuine over count 
of the census. It was also used to offset the Current 
Population Survey sample cases that could not be matched 
to the census within the local area to which matching was 



3 The terms undercount and overcount are understood here 
to mean gross undercount and gross overcount. 



342 

restricted, often as a result of faulty determ 
census geography, or indistinct addresses. 

As described above in Chapter 4, the form o: 
dual-system estimate used in the 1980 Post-Enui 
Program for a particular demographic stratum w< 
follows (see Cowan and Pay, 1984} s 

N T Np (NC - E - G - D - I) / M 

where % is an estimate of the total populatioi 
is the weighted sample total of the number of j 
the P sample , M is the weighted number of perse 
in both the census and the P sample, NC is the 
count of per sons , E is the weighted number of j 
were census erroneous enumerations from the E i 
is the weighted number of persons in incorrect] 
housing units in the census from the E sample, 
weighted number of duplicate counts in the cens 
the E sample, and I is the count from the censi 
field-related imputations. 

The four subtracted quantities are therefore 
people who were counted in the census who shoul 
been, e.g., people born after Census Day, (G) i 
were counted in the census but placed in the wr 
and therefore, given the blocking used in the c 
match, were incapable of being matched, (D) pec 
were counted in the census more than once in ti 
enumeration district, and (I) people who were i 
into the census, e.g., people for whom no quest 
was returned or residences that were imputed tc 
occupied. An error for these four quantities s 
tially less than the magnitudes being measured 
sary for a reasonable estimate of those missed 
census, since the magnitudes of the quantities 
added and subtracted is of the same order as tt 
undercount. From Cowan and Fay (1984) we have 
national percentage rates for the above four qv 

Erroneous Enumerations 1.6 

Geocoding Errors 1.0 

Duplicates 0.8 

Imputations 1.3 

The question thus becomes, how can this balanci 
undercount and the overcount be reduced or elin 
from the Post-Enumeration Program estimation pr 



343 

The Census Bureau has recently advanced one possibil- 
ity, discussed above g for avoiding the necessity of 
balancing (see Wolter, 1984) . The idea is to use an 
independent survey in the Post-Enumeration Program in 
place of the Current Population Survey. The independent 
survey would sample geographically compact clusters and 
check for both over- and under enumeration in the same 
clusters using a two-way match* with a two-way matchs 
(1) census duplicates are easier to find by checking 
post- enumeration survey records that match with more than 
one census record and (2) census erroneous enumerations 
and mis-geocoded records can be estimated from an 
examination of census records that do not match to any 
post-enumeration survey records. This avoids the need 
for intricate assumptions of balancing errors. In 
addition, local area estimates of net undercount could be 
exploited in model development (through the use of local 
area characteristics as auxiliary variables) as well as 
in model validation (by comparing for a subsample of 
small areas the direct and model-based estimates of 
undercount). As indicated above, the panel is in favor 
of this proposal. 



Estimation of the Degree of Independence Between 
Survey and Census 

A major and untested assumption of the 1980 Post- 
Enumeration Program is that, for each person, the events 
of being included in the census and the Current Population 
Survey are independent. However, there is evidence sup- 
porting the belief that many of the types of individuals 
missed in the census are also missed disproportionately 
by the Current Population Survey and, for that matter, by 
any type of household survey technique. For example, the 
CPS estimates of young males, particularly blacks, are 
consistently below the corresponding demographic esti- 
mates. The ethnographic study sponsored by the Census 
Bureau provided additional evidence of this phenomenon 
(see Valentine and Valentine, 1971). This lack of inde- 
pendence of inclusion in the evaluation survey and the 
census may be particularly likely for persons with 
tenuous or irregular household connections, for undocu- 
mented aliens, and for other groups who have reason to 
avoid visibility of any sort. For these people, the 
frequency of being missed by both the census and the 
survey may be substantially different than would be 



344 

indicated if these events could be regarded 
istically independent* (Equivalently, the ] 
inclusion in the census , given inclusion in 
enumeration survey, may not equal the uncon< 
probability of inclusion in the census.) Tl 
parameter mentioned in Chapter 4, may be sul 
different from 1. 

As mentioned in Chapter 4, the Census Bu: 
of a stratified dual-system estimator 8 that 
lation is first stratified using certain dei 
characteristics, then the dual-system estinu 
separately within strata. This serves two \ 
First, the k's for each subtable formed witl 
cation may all be closer to 1 than for the i 
case* (However, the dependence is still lil 
substantial.) 

Second, the stratification helps keep th 
of inclusion constant within strata. This : 
assumption often used in the model underlyi: 
estimation* These two assumptions, indepen< 
inclusion probabilities for the two lists, 
equality of the inclusion probabilities witl 
(within strata) , both need to be carefully i 
assumptions are at least partially confound* 
study of the degree of validity or robustne; 
independence assumption will be enhanced by 
studying the degree of validity or robustne: 
equality assumption. It is possible to exai 
individuals missed by the census to see whe 
differ with respect to various covariates n< 
determination of the strata. The extent of 
ences would then be a test for equality of : 
probabilities. However, if the individuals 
only those caught by the post-enumeration si 
results may be affected by any nonindepende 
census and post-enumeration survey. Theref< 
tant factor is the gathering of information 
viduals missed by both the census and the p< 
enumeration survey. 

A method that should be tested for its p 
count some of the individuals who typically 
counting method used by censuses and survey 
reverse record check. Systematic observati< 
below, should also be tested for this purpo 
the context of the Post-Enumeration Program 
approach that deals with certain aspects of 
tionally difficult problem is triple system 



345 

(see Marks et al., 1977, and Appendix 4*1). In this 
approach, the independence assumption for two lists is 
often replaced by an assumption of conditional indepen- 
dence involving three lists. Validation of the assumption 
of conditional independence would be needed. Unfortu- 
nately, there is at present no w third 11 system with a 
reasonably complete coverage of the population of the 
United States* A union of suitably selected administra- 
tive records might be envisaged, but various problems, 
outlined in Chapter 4, make this possibility appear 
unlikely for the immediate future. 

Ericksen and Kadane (1985) and Fellegi (1985) emphasize 
the sensitivity of dual-system estimation to the assump- 
tion of the independence of inclusion frequencies. 
Ericksen and Kadane (1985) propose a method that may be 
applicable to some special groups* They argue that, for 
blacks in the 1970 census, the probability of inclusion 
in the census, given inclusion in the post-enumeration 
survey, was not equal to the probability of inclusion in 
the census, as the assumption of independence would 
indicate, but was instead greater than twice the probabil- 
ity of inclusion in the census. The method used assumed 
that the demographic estimate of the national black under- 
count was correct for 1970. The general applicability of 
this approach is limited since the estimation of k would 
require knowledge of the total population which is the 
end objective in wishing to estimate k in the first 
place. Furthermore, Fellegi (1985) argues that the 
numerical stability of their estimate is not good. Never- 
theless, the panel supports the call of Ericksen and 
Kadane for further research to understand the degree of 
dependence that exists for various subpopulations and for 
various lists or surveys, i.e., how k depends on the list, 
in addition to the census, that is used and on the 
population studied. 

Recommendation 8.6. We support the Census Bureau's 
research directed toward developing the 1990 Post- 
Enumeration Program and recommend that such research 
emphasize the following areas s 

(a) Reduction of post-enumeration survey 
nonresponse; 

(b) Reduction of unresolved matches between 
records for individuals listed in the 
post-enumeration survey and the decennial 
census; 



346 

(c) Validation of the assumptions and/or 
development of alternative methodologi 
respect to netting-out of overcounts a 
undercounts with reference to the plac 
enumeration? and 

(d) Investigation of alternatives to the a 
tion that the inclusion of individuals 
post- enumeration survey is unrelated t 
inclusion in the decennial census and 
estimation of the strength of this rel 



Some Remaining Considerations 

Below we consider two remaining problem areas < 
1980 PEP program, timeliness and variance estimat 
discuss the current approach of the Census Bureau 
resolution. The panel has no recommendations to 
here other than endorsing the efforts of the Censi 
Bureau. 



Timeliness 

One of the most important aspects of a potenti 
adjustment program, resulting from the current de< 
for reapportionment and redistricting, is the tiro 
of the program. In 1980 , even preliminary estimal 
not available from the Post- Enumeration Program ui 
late in 1981. Apart from other considerations, tl 
factor alone caused the results to be unusable foi 
purposes of adjustment. There is consequently a s 
stantial interest in speeding up the Post-Enumera 
Program process, without compromising its quality, 
fact, one of the key elements now under investigai 
the Census Bureau, and mentioned prominently in tl 
position paper by Kirk Wolter, is testing of the < 
tional feasibility of an adjustment by December 3] 
1990. The possibility of meeting such a deadline 
be enhanced by the use of a pre-enumeration survey 
extended use of automation, both under consider at] 
the Census Bureau. 

Use of a Pre-enumeration Survey. In order to t 
the total sample size, the 1980 Post-Enumeration I 
made use of the April and August Current Populatic 
Surveys, which served as more or less independent 



347 

enumeration surveys* It has been suggested that earlier 
months of the Current Population Survey could be used, 
which would be ready for matching at the time of the 
decennial census. The January through March Current 
Population Surveys would be possibilities, with March 
having the additional advantage of containing a wealth of 
characteristics information that could be used for pur- 
poses of content evaluation and possibly modeling. Even 
if the Current Population Survey is not used in the 1990 
PEP, the timing of the PEP will involve similar considera- 
tions. 

The advantage of the use of a pre-enumeration survey 
is the possibility of having the survey files ready and 
waiting for the creation of the decennial census files. 
Even so, an appreciable fraction of the matching could 
not be done until personal follow-up was completed. 

A possible disadvantage of a pre-enumeration survey is 
the potential sensitization of the population to the 
decennial census. As a result of the survey experience, 
the pre-enumeration survey interviewees would probably be 
more aware of the upcoming decennial census than the 
general population, and this may affect their actions 
regarding inclusion in the census. (It is not clear 
whether this is likely to lead to a greater or lesser 
desire to be enumerated.) However, sensitization is also 
possible with the use of a post-enumeration survey, since 
the taking of the census may affect cooperation with the 
survey. Sensitization could be reduced by the use of a 
survey that either precedes or succeeds the decennial 
census by a longer time period, say, one or two years. 
However, the panel has strong reservations about that 
idea. As the time period between survey and census 
lengthens, population mobility, deaths, etc., are likely 
to increase problems of accuracy. 

The relative trade-off between a possible sensitization 
of the population versus the early preparation of pre- 
enumeration survey files to be matched to the census is 
at this time unknown. This is an area in which research 
is needed; it is a major part of the Census Bureau's 1986 
pretest program for coverage evaluation methodologies. 

Automation. There are currently a number of field 
tests planned by the Census Bureau to determine the most 
effective use of new automation technologies for informa- 
tion collection, transfer, storage, and retrieval (see 
Chapter 3). To date, these tests have concentrated on 
the roles of collection offices, processing offices, and 



348 

logistics. Of key importance from the point of view 
coverage evaluation are attempts to generate, very e< 
on, machine-readable records of the basic identificat 
of enumerated persons and households, adequate for 
computer matching. 

In order to exploit the potential existence, at ar 
early date, of both census and post-enumeration surve 
records in machine- readable form, effective computer 
matching algorithms have to be developed in 1980, t 
matching was done clerically, a slow process that lira 
the search to one enumeration district for each CPS 
record. In trying to improve the timeliness of Post- 
Enumeration Program estimates, the Census Bureau (see 
Wolter, 1984) is placing a great deal of emphasis on 
ability to develop software for automated matching. 

The algorithm used by the Census Bureau (see Kelle 
1984a? Jaro, 1985) for computer matching was discussei 
in Chapter 4 on matching procedures. We are not recoi 
mending any modifications to that basic strategy. HOI 
ever, we do have one suggestion that the Census Bureai 
may wish to investigate further. The idea is to utiU 
computers to assist the clerical matching. A large pj 
portion of cases unresolved by the computer matching 
algorithm take the form of records in one or the othei 
the two files having a multiplicity of possible matchi 
cases in the other file, but with inadequate evidence 
make a unique status assignment by computer. Such cas 
can be presented to clerks on display terminals in a 
split-screen fashion for visual inspection and decisio 
Some proportion of cases will still remain unresolved 
because the reported information is inadequate for mat 
status determination. However, the efficiency and spe 
of dealing with clerically resolvable cases should be 
greatly enhanced. (A recent paper indicates that the 
Census Bureau is already planning something quite clos 
to this; see Jaro, 1985). 

Automation will have a much greater impact on match 
operations at the Census Bureau than merely speeding u; 
the processing. For example, it might allow the possi 
ity of searching a wider geographic area for a matchiri' 
record, and hence lessen reliance on the need for fine 
balancing local over- and undercounts. 



Estimation of Variance Due to Matching 

Should a 1990 version of the Post-Enumeration Progra 
be used to adjust the population counts, it will be 



349 

important to derive estimates of the error attributable 
to various causes , including matching. We concentrate 
here on the estimation of the variance of the matching 
process. 

Matching can be considered to consist of three phases. 
An initial computer match, a subsequent clerical operation 
to resolve the more difficult cases, and imputation for 
cases whose match status could not otherwise be resolved. 
Given two files to be matched, the very nature of a com- 
puter algorithm is such that, conditional upon the files 
and the computer algorithm, there is no variance. Of 
course stochastic response errors in both the census and 
the post- enumeration survey will undoubtedly induce some 
matching variance. The estimation of this variance is 
technically feasible but would probably introduce serious 
operational difficulties when superimposed on the other 
rigorous requirements of a coverage evaluation program. 
The rough magnitude of this variance might, however, be 
estimated using some intercensal experiments. 

For the component of the matching that is done cleri- 
cally, a combination of designs involving interpenetra- 
tion of a sample of matching clerks, together with some 
rematch ing, can readily be established. The design can 
be fully analogous to the estimation of interviewer and 
response variance (see Hansen et al., 1971? Fellegi, 
1964). 



SYSTEMATIC OBSERVER METHODOLOGY: CONSIDERATIONS FOR 1990 

It is generally recognized that a serious undercount 
problem exists for some members of poor minority groups 
living in large central cities. There are also indica- 
tions that in these areas the largest number of indi- 
viduals are missed through incomplete reporting of house- 
hold members rather than through failure to enumerate the 
households themselves. In particular, demographic studies 
using sex ratios seem to indicate that a disproportionate 
number of adult black males are missed by the census. 
Other studies suggest that it is unrealistic to expect 
improved traditional interview or self -enumeration pro- 
cedures to increase substantially the coverage of such 
individuals. Finally, it is also for such individuals 
and for such areas that the Census Bureau has experienced 
the greatest difficulties in using matching to estimate 
undercount. These general perceptions, in conjunction 
with the resident observer study of Charles and Betty 



350 

Valentine described in Chapter 5, provide the : 
for this section, which outlines a research pr 
at finding out who is missed and at developing 
to estimate the number of individuals missed. 

In 1972, the Census Bureau asked the Adviso: 
on Problems of Census Enumeration of the Natio 
Council to assess the Valentine study. The co: 
reviewed the study and suggested that the Cens 
continue to support such studies. The Census : 
contacted additional anthropologists and under 
support graduate student participant observer ; 
For a number of reasons, including personnel p 
none of the studies was completely successful, 
them took the form of Census Bureau support fo 
students in a graduate program at a university 

We believe in the potential of a trained in 
through normal, day-to-day encounters, to becoi 
people in his or her neighborhood who would be 
cult to enumerate through typical census proce< 
There is a major difference between an effort 
kind and the anthropological studies such as tl 
study. In the Valentine study, a considerable 
personal information, such as sources of income 
sonal relationships, was obtained (and kept co 
of course) . In the type of study proposed in 1 
tion, the only information obtained would be a 
race, sex, and address. This difference in de< 
invasiveness might prevent the occurrence of t! 
experienced in resident observer studies conduc 
the Valentine study. The proposed study makes 
type of enumeration similar to that used in Cai 
described in Chapter 5. The different objecti 
proposed study require the use of a term diffei 
the anthropological one of resident observers, 
adopted the term systematic observers. 

Research activity on systematic observation 
coordinated with pretests being conducted for 1 
census, but such research is not restricted to 
activities. Consistent with the two terms res: 
observation and systematic observation, we env 
possible types of studies. In resident observe 
similar to the Valentine study, anthropologist 
an area on an essentially full-time basis for 
siderable period of time. In such studies, hi< 
trained professionals attempt to identify the : 
noncompliance and misreporting as well as to q 
magnitude of the problem. The identification < 



351 

reason for noncompliance, especially with respect to 
different population subgroups, is vital for understand- 
ing how coverage improvement and coverage evaluation 
might be improved to help minimize the problem of dif- 
ferential under cover age. Observers of this type could be 
placed in a number of different types of localities. 
Brooks (1974) suggested that research could profitably be 
conducted in the following types of study areas: 

(1) A Mexican-American community in the Southwest; 

(2) A transplanted Appalachian community in the urban 
north central region; 

(3) An urban black community in the north central 
region; 

(4) A northeastern Puerto Rican community; 

(5) A northeastern black urban community; 

(6) A Navajo reservation; 

(7) A black southern rural community; 

(8) A white ethnic community; and 

(9) A white or mixed southern urban area. 

Resident observer studies might provide information 
leading to the development of alternative data collection 
procedures. 

The second type of activity, called systematic obser- 
vation, would employ less highly trained professionals. 
The observers would live in the area and become familiar 
enough with the residents to make reliable reports on the 
number of persons in each of a small number of households 
at a particular date, as well as their name, age, sex, 
and race. It is conjectured that this activity would 
require only a fraction of an employee's time. By initi- 
ating several systematic observer studies at the earliest 
possible time, the following questions can be 
investigated: 

(1) How difficult is it to recruit, train, and 
position systematic observers? 

(2) How long must systematic observers reside in an 
area before they can provide reliable data on 
residents? 

(3) How large an area (number of households) can a 
systematic observer be expected to provide 
reliable data for? 

(4) What procedures can be used to validate the 
quality of the data provided by the systematic 
observers? 



352 

(5) Are different procedures required in differei 
types of areas? 

(6) Can problems of perceived invasion of privac; 
overcome? 

Recommendation 8,7. We recommend that the Census 
initiate a research program on systematic observe 
with a view toward the use of this method for a i 
of areas at the time of the 1990 census. 

Naturally, the results of a research program are 

unknown at the present time. However, to make cleai 

nature of our objectives, we outline a possible scei 

for the use of systematic observers. The first stej 

the process would be the delineation of the area of 

This would include, but would not necessarily be lii 

to, the low- income areas of large central cities. \ 

Census Bureau would draw an area sample of segments! 

containing, say, 20 housing units. The Census Bure< 

would then recruit a full-time Census Bureau employ* 

live in or near each sample segment for a period of 

one year beginning at least nine months prior to Cer 

Day for the 1990 census. The individuals recruited 

be employees of the Census Bureau, and as such woulc 

sworn to uphold the confidentiality of the informat: 

collected, and would be subject to fines and imprisc 

for any betrayal of that responsibility. In those < 

in which the Census Bureau had offices, the individi 

could spend part of their time as office employees ( 

Census Bureau. A condition of their employment wouJ 

that they live in the study area and that they becoi 

knowledgeable about the nature and composition of he 

holds in the area assigned to them. Living within 1 

area, they would identify themselves as employees oi 

Census Bureau and would explain that part of their ; 

to become familiar with the community. As full-tim< 

employees of the Census Bureau, they would be instri 

in procedures for data collection and in the technic 

of systematic observation. At some point in the ce: 

procedure, presumably a few weeks before Census Day 

systematic observers would prepare a listing of hou 

in their designated area, and indicate the househol< 

composition. 

The need for the systematic observers to identif 
selves as employees of the Census Bureau raises an 
tant question as to whether the local area will be 
tized to the decennial census when it is taken , tha 



353 

whether the individuals residing in the area will be 
counted more or less well than the population in general. 
The proposed studies should attempt to measure the extent 
of any such sensitization. 

The systematic observers could be used in other aspects 
of the census operation. For example , they might be used 
as enumerators or supervisors in the general area, but at 
some distance from the area segment for which they had 
primary responsibility. The area segment for which the 
systematic enumerator reported household composition 
would be enumerated in the census by a different census 
enumerator operating under an independent supervisor. 

The data collected in the regular census enumeration 
could be matched against the data collected by the sys- 
tematic observer. Because the original study area seg- 
ments were randomly chosen , it would be possible to con- 
struct an estimator of the net number missed by race/ 
age, and sex. It would also be possible to make esti- 
mates of household composition for the population. Some 
details of the sampling calculations underlying this 
statement are contained in Appendix 8.2. 

It must be stressed that the ethical and public rela- 
tions dimensions of such an operation are the most prob- 
lematic and must be considered with great care, since 
there is the possibility of these type of studies being 
perceived as an invasion of privacy. The authors of the 
Valentine report, of ethnographic and anthropological 
studies such as Tally's Corner (Liebow, 1967) , and of 
internal memos of the Census Bureau conclude that the 
ethical problems are not insurmountable. Moreover, the 
resident observer studies indicate that a person whose 
avowed interest is the study of the community will be 
tolerated by that community. Some of these issues were 
addressed in an October 1974 memo by Harold Nisselson, 
Chairman of the 1980 Census Coverage Committee. The 
basic feeling of the coverage committee was that such 
studies, while sensitive, can be defended as being 
responsible scientific studies. They can be designed in 
a manner such that little or no disruption of the 
activities of the members of the community need occur , 

Is and such that information has a minimal chance of being 
disclosed. As mentioned above in Chapter 5, the pos- 
sibility of using focus groups should be considered here, 

ft- both to assess the ethical and public relations dimensions 
of systematic observation, as well as to help understand 
ways in which these studies may be made more effective. 



354 

Systematic observer studies are expensive, but 
total cost of including in the census a broad samp 
the type described might be comparable to many of 
activities used in the 1980 census to increase cov 
The cost of a large systematic observer study is a 
the same order of magnitude as the post-enumeratic 
studies being considered as a part of a census eva 
and adjustment program. Some rough cost considera 
are also contained in Appendix 8.2. 



APPENDIX 8.1 

THE POPULATION OF ILLEGAL ALIENS: 
METHODS AND ESTIMATES 

Over the past 10 years or so, concern with the number of 
illegal migrants, and particularly with those coming from 
Mexico , has been accompanied by a plethora of estimates 
of their numbers. In most cases the interest is in esti- 
mating the stock of illegal migrants at some point in 
time. There are a few examples, however, of attempts to 
estimate yearly flows. A sufficiently long series of 
estimates of the latter, in combination with appropriate 
information on survival patterns (determined by mortality 
and return migration) , yield an estimate of the stock of 
migrants. The estimates for illegal aliens residing in 
the United States have ranged from as little as 600,000 
during the mid-1970s (Robinson, 1980) to a high of about 
8.2 million around 1975 (Leskp and Associates et al., 
1975). In a recent study, Warren and Passel (1983) pro- 
vided a lower bound for the estimates of illegal migrants 
by estimating all those who were counted during the 1980 
census. Their final figure of about 2.0 million is 
reasonably close to other estimates of total illegal 
migrants residing in the United States during the 1970s. 
Thus, Lancaster and Scheuren (1978) obtained a figure for 
the age range 18-44 in 1973 of about 4.0 million as the 
midpoint of a subjective confidence interval with 1.4 and 
5.72 million as extremes. Bean et al. (1983) cal- 
culated that the correct figure for Mexicans in 1980 
should be not less than 1.5 million and no more than 3.8 
million (though the lower bound depends heavily on numer- 
ous assumptions) , whereas Korns (1979) calculated an 
estimate of about 2.0 million. 

Table 8.1 classifies the available estimates according 
to a combination of characteristics. The first one is 
the quantity being estimated, stocks or flows. Estimates 
>f stocks that are derived from original estimates of 
Elows are classified as being part of the latter 's set. 
Fhe second characteristic is the type of -information 
source used. We distinguish three autonomous sources 
special surveys of migrants (or of migrants' families) , 
Apprehension data, and departure and arrival records 
:hat can be used in combination with conventional sources 
inch as censuses, surveys, and vital statistics of the 
ountry of origin of the migrants or of the United States 
r both. 

355 



356 




357 














T3 




"o 


a "S 


C5 


*8 












c 


cd 




'C 


1 ?? 


.2 














"cd 


S 




o, 


o "^ 


rS 


"5 












CM 


U 




<u 


0*2 


S 


JH 















V5 




" 


^""* cd 


'O 


o 














r- 










c 














CN 










cd 




















(U W3 
















C 


-a 




Q 


S 5 


cu 


g 












I 


o 







o | 


1 


cd 














S 




CN 


^ ^ 


1 


a. 

"O 












ON 


1 

&C 


O 


1-J 


fi 

2 


G 
cd 

2 















ico usi 


C/3 

.y 

C/3 





"o 


1 


S 

'fi 












Residual: H 


from Mex 


1 
C/3 

1 
'> 


Range of esl 
1970-197: 


5 

I 1 


migrants. 
Range of est 


^o 

cd 

1 




f 




M 


<4_ 


c 


o 




, 


^ 


a> 


:1 










O 


o 


C3 










s 




i -o 


en 
r- 


cd 


1 




ed 




's 

r-. 


CO 

O 


ll 


c 

ed 




. -. WWWW 4 uuu UVIIVUIUI \1^IOJ 

match of CPS, IRS, and SSA recoi 


of estimates: 2.9-5.7 million for 19 


al: Warren (1981) comparison of esti 
i CPS and INS counts. 


of estimates: For 1979, a point estirr 
Ilion illegal residents. 


JO 

"cd 

C 
C 

15 

Q 
"cd 



o 

1 

2 




of estimates: For 1970-1975, 0.6-4. 


al: Bean et al. (1983) analysis of 19* 
s for Mexican population. 


of estimates: For 1980, the estimated 
egal Mexican residents is 1.5-3.8 m 


ito estimates of stocks by assuming 




i r 


1 


I 1 


4) *S 
g> E 


| 


o 


1 


<u d 
c-2 


3 

P 


f* 


1 
























1 
























o 













crt 












s 












3 
on 








1 




1 












<U 








> 




rC 












"I 








1 

















-o 
















"i 








i 

i/i 








i 




1 




o 








C/3 












5 


. 


CH 








C 








c 




T3 


o" 


















cd 






CO 


















o 






25 


1 








C/3 


s 






'S 




I 


'c 


"S 








t*-i 


s 






<4-f 




en 


o 


If 

C/3^ 








I 


I 

> 






O 

ll 




ft 

s e 


2 
8 












Q 






IS m o 




s ? 




5i 








1 


9 






i a 




11 

9 CD 


1 



T . r k.^ T l"i If ' "M / t/i 

U i 'i. . rn.x c 



' ^ "* 1 j ' tf" 1 i i * *" >> / ~ ' _ f^> ' 

i ii ' ^^ ' i e'jaJ H* 'C. t l djj 1 w r Ofc L v l ( 

" ' 5 ' i O 1 * ! Ti^ 

i ^- V >TI r t L . 

i - : v i<- - F '-n*- ' f '. 1 r i on 

t 1 , *, ^ I ' (.'lit / u ' , * I. 

. ^ ' 1 v ^ -0< t I " ^ i ^ 

. l* Ci ^ , T i ' f ! " Jvi w 



c.u on a 

lor ex 

L - i i Mex 

^ I stives 

~ t - 1 x.aate 



1 i 



uf 



L e*.s j i c . 

yieldt d r 
f^ c - 

t u it j 



p*-C l 2J 1 



<2 % per 



v ' ed rat 
1 r a fi 
quantity' 

01 second 

.not'iod i 

itflth the 

' o-f ille 

f e r ample 

. -e popul 

- then i 

between 

c popula- 

t r tae ex 

been u: 

: f i n for HI 



' < ' L 



i * "ds (La 
r iif die ob" 

n ities, : 
it ^e the; 

ic Lrtimat 



cl _p-i " o? c CiCi^it'o 1^,1" ^e ^ t " : 

bO til Ol WfiiCli ai<ci StlbjhC U e- C <^l z 

It is important to ; o. ir- i li'tt. * ^i 

e s t i ma t e s cl e sc r I be d aLt > * - - n J \ i *~ 
sarily tailored to thj <\*-, r u ^ fu lt r f 
but to the fine tuning c i in 1 ' 

are not counted in of f !i ML i ,1 i i 
census, for example), wi~t'.ecii otiijr = 
the production of estlrr.ar.cs o'~ f I' '? 
migrants. While the former subpopalation group pos^ 

contributes to census undercount, the second escape 
altogether from migration records and may affect th< 
accuracy of methods to evaluate census coverage* 

As methods of improved census coverage are refine 
the magnitude of the net undercount may become , in 
absolute terms , quite insignificant. However , the 



.' * the itetho 

* 1 are not i 

^llrqal migra 

j * I migrants 

(mssed by ti 

^. o^ly geare< 

stocks of I 



ly, and Scheuren, Frederick 
Background for an administrative records 
census. f'p. 137-146 in Amer_jcan_ statist ical 



Wash I ng ton , D . C . : 



American Statistical Association* 

:at 1st ical Association 

Report of the ASA technical panel on the 

census undercount. _American Statistician 

38(4); 252-25 6 . "~~~~ ~~" 

L.G., Peter sen, G.R, r Pecitti, E,B. t and Smith, 

The California Automated Mortality Linkage 

System [CAMLIST] . Af^o^ari_jJour^^ 

Health 74(12) :1324-1330. 

bara 

Error profiles: U'ses and abuses, pp. 117-130 

i n Tommy Wright, e d . , StaJjLSticjJ.jyi^^ 

t!l!L.L!!^ ^ ew York: 

Academic Press. 

Counting versus estimation in a census A 

difficult decision. Pp. 42-49 in Amer_ica_n_ 



Washington r 

D.C^: American Statistical Association. 
Affadavit Presented to District Court, 
Southern District of New York, Mario Cuomo, et 
al* r Plaintiff (s) f Malcolm Baldridge^ et al. f 
Defendant (s) , 80 Civ. 4550 (JES) . 
Comment from the Census Bureau* 
Statistician 38(4) :257-260. 



361 



t _ < w t -_. ,f - > =. Oll d cost are devel- 

' " . ( _t LJ* e order of mag- 
,- - " __, ! - - ?.-_ , -<1 lit a large sys- 

* _ i"' 1^ L f-u s 3 be desirable to 

j _ ^G^iefii areas such as 

i - i i ^ -* i <*=L j'f variability 

i i j ^ " ., 3 rates than are 

/ ' ' ' 1 L" " J I?IL jndercount for th-; 

f . . i , , r *- /c ' but a 5 percent 

\^ - n -, ^ ,r-T / f * ) - por Dilation that is 

> i c ^! --i'ltt- a figure that will 
i u r j,^ jci'iacy per unit 

L i - i - / ' i . f ~ i ci^erver is respon- 

_r-n t , -i t i ip u^jylt, that 50 
r ! * ? -ount^d as part of 

j 'it - ; r L " mf iio people are 

^i r oi rbe miss rate 
t j j MJ j v t vMjj J oe 100 per- 

/ n, i - i I'lv containing 400 

* i n t ' >< i ' , i f ^iir- Ant of variation 

i ( , * v-j rf. o 312,000, the 

t < i t f )t 400 sampling 

. r , t i P f i j v, dollars. This 

. ^ > i i " j vsUmatic observers 

. i r * i, L , i , , * M , r , !lL f including fringe 

? a , i ' r l f - c 1 i u* : i ^ ihj c uiiie that obser- 

1 _ ' ' s 1 i f ,.n , > t i t ^QJ talent of on. e~ 

- ' * / j i M ^ r inn I flit supervision and 

t- 'f i to 1 v c f i f j s id c-osts, a study 

i ( , , i uj j i c - 1 /* ? v;culd cost on the 



360 



Washington, D.C.s U.S. Department of Commerce. 
?he 1950 Censuses; How They Were Taken. 
Washington, D.C.; U.S. Department of Commerce. 
?he Post-Enumeration Survey; 1950. Technical 
Paper No. 4. Washington, D.C.: U.S. 
Jepartment of Commerce. 

Evaluation and Research Program of the U.S. 
Censuses of Population and Housing, 1960: 
iccuracy of Data on Population Characteristics 
is Measured by CPS-Census Match. Series ER 
iO, No. 5. Washington, D.C.: U.S. Department 
>f Commerce. 

Evaluation and Research Program of the U.S. 
Censuses of Population and Housing, 1960; 
lecord Check Studies of Population Coverage. 
eries ER 60, No. 2. Washington, D.C.: U.S. 
Department of Commerce. 
960 Censuses of Population and Housing; 
rocedural History. Washington, D.C.: U.S. 
epartment of Commerce. 

ivaluation and Research Program of the U.S. 
Censuses of Population and Housing, 1960; 
lecord Check Study of Accuracy of Income 
reporting. Series ER 60, No. 8. Washington, 
.C.: U.S. Department of Commerce. 
970 Census User's Guide. Part 1. 
ashington, D.C.: U.S. Department of Commerce, 
esting Census Coverage Through Drivers 1 
icenses. 1970 Census Preliminary Evaluation 
esults Memorandum Series No. 21. Washington, 
.C. 

970 Census of Housing. Vol. I, Chapter B, 
etailed Characteristics. Part -1, United 
tates Summary. Final Report HC(1)-A1. 
ashington, D.C.; U.S. Department of Commerce. 
970 Census of Population. Vol. I, Chapter C, 
eneral Social and Economic Characteristics. 
art 1, United States Summary. Final Report 
C(1)-C1. Washington, D.C.; U.S. Department 
f Commerce. 

opulation and Housing Inquiries in U.S. 
ecennial Censuses, 1790-197J). Working Paper 
9. Washington, D.C.; U.S. Department of 
Diranerce . 



UJ. t-Ii 

Balinski, M.L., and Young, H.P. 

1982 Fair Representations Meeting the I< 
Man y One Vote.. New Haven; Yale Un 
Press* 

Bean, F.D., King, A.G., and Passel, J.S. 

1983 The number of illegal migrants of M< 
origin in the United States: Sex r 
estimates for 1980. Demography 20 C 

Bishop, Y.M.M., Fienberg, S.E., and Holland, P 
1975 Discrete Multivariate Analysis; Th< 
Practice* Cambridge, Mass.: MIT P 
Bounpane, Peter 

1983 Affidavit Presented to District Cou 
Southern District of New York, Mari< 
al., Plaintiff (s) , Malcolm Baldrige 
Defendant(s) , 80 Civ. 4550 (JES) . 

Brooks, C.A. 

1974 Proposed Use of Participant-Observe: 
Studying the Problems of Census Und< 
in Selected Areas. Unpublished mem< 
Bureau of the Census, Washington, D, 

Brooks, C.A., and Bailar, B.A. 

1978 An Error Profile: Employment as Me< 
the Current Population Survey. Sta 
Policy Working Paper No. 3. Washinc 
D.C.: U.S. Department of Commerce. 

Brown, Rachel 

1984 Research plan on the uses of adminii 
records for the 1990 census. Pp. 44 
American Statistical Association 19J 
Proceedings of the Social Statistics 
Washington, D.C.: American Statist] 
Association. 

Bryce, Herrington J. 

1980 The impact of the undercount on stai 
local government transfers. Pp. IIS 
Proceedings of the 1980 Conference < 
Undercount. Bureau of the Census. 
D.C. : U.S. Department of Commerce. 
Bureau of the Census 

1944 U.S. Census of Population: 1940, D: 
Fertility 1940 and 1910, Standardize 



An uverview or population and Housing Census 

Evaluation Programs Conducted at the Bureau of 

the Census, Background paper for the March 

1978 meeting of the Census Advisory Committee 

of the American Statistical Association, 

Washington, D.C. 

The Meaning of Enumeration. 1990 Planning 

Conference Series No. 1. Washington, D.C.: 

U.S. Department of Commerce. 

User's Guide - 1980 Census of Population and 

Housing. Part A f Text. Washington, D.C.: 

U.S. Department of Commerce. 

Statistical Abstract of the United States; 

1982-83. 103rd Edition. Washington, D.C.: 

U.S. Department of Commerce. 

Report of the United States Bureau of the 

Census in Response to the Order of the Court, 

Ordered August 6, 1982. Report submitted to 

the District Court, Southern District of New 

York, Mario Cuomo, et al., Plaintiff (s) , 

Malcolm Baldrige, et al., Defendant (s) , 80 

Civ. 4550 (JES) . 

Some Major Issues in Planning the 1990 

Census. Background paper for Data Needs of 

America in Transition, Workshop on the 1990 

Census, Congressional Research Service, 

Washington, D.C. 

Introduction and overview of the 1980 census. 

Chapter 1 in History of the 1980 Census of 

Population and Housing. Draft. Washington, 

D.C. 

Census of Population and Housing 1980; Public 

Use Microdata Samples Technical Documentation. 

Washington, D.C. : U.S. Department of Commerce. 

1980 Census of Population. Vol. 1, Chapter B, 

General Population Characteristics. Part 1, 

U.S. Summary. PC 80-1-B1. Washington, D.C.: 

U.S. Department of Commerce. 

Population and Housing Content Items, chapter 

13 in History of the 1980 Census of Population 

and Housing. Draft. Washington, D.C. 

1980 Census of Population. Vol. 1, Chapter C, 

General Social and Economic Characteristics. 

Part If U.S. Summary. PC 80-1-C1. Washington, 

D.C.: U.S. Department of Commerce. 



J-^'U census or Population and Housing 
Evaluation and Research Programs The Me icare 
Record Check; An Evaluation of the Cove age 
of Persons 65 Years of Age and Over in t e 
1970 Census, PHC(E)-7. Washington, D.C : 
U.S. Department of Commerce. 

1973e 1970 Census of Population and Housing 

Evaluation and Research Program; Test c 
Birth Registration Completeness 1964 to 968, 
PHC(E)-2. Washington, B.C.: U.S. Depar ment 
of Commerce. 

1974a 1970 Census of Population and Housing 

Evaluation and Research Program; Estimg es of 
Coverage of Population by Sex/ Race, and Age: 
Demographic Analysis. PHC(E)-4. washir ton, 
D.C.: U.S. Department of Commerce. 

1974b 1970 Censuses of Population and Housing 

Evaluation and Research Program; Effect of 
Special Procedures to Improve Coverage i the 
1970 Census. PHC(E)-6. Washington, D.C : 
U.S. Department of Commerce. 
1970 Census of Population and Housing 
Evaluation and Research Program; Accure y of 
Data for Selected Housing Characteristic as 
Measured by Reinterviews. PHC(E)~10. 
Washington, D.C.: U.S. Department of Co merce . 
1970 Census of Population and Housing 
Evaluation and Research Program; Accurc y of 
Data for Selected Population Character is ics 
as Measured by the 1970 CPS-Census Matcj 
PHC(E)-11. Washington, D.C.: U.S. Depa tment, 
of Commerce. 

1976a 1970 Census of Population and Housing; 

Procedural History. Final Report PHC(R) 1. 
Washington, D.C.: U.S. Department of 
Commerce . 

1976b Cost Estimates for a 1985 Mid-Decade Cer us 
(Distributed by Fiscal Year) . Unpublish d 
tables (March 3, 1976). 

1976c Cost Approximations for Alternative 198 f 
Population and Housing Census, Survey 
Purposes. Unpublished tables (March 4, 976) * 



DJUJL t.e uex 

os Estados Unidos ENEPNEU ). Secretaria 

lei Trabajo y Prevision Social* Centro 

acional de Informacion y Estadisticas del 

'r aba jo. Mexico City. 

nny R. f and Hogan, Howard R. 

lensus experimental match studies. Pp. 

73-176 in American Statistical Association 

.983 Proceedings of the Survey Research 

ection. Washington, D.C.: American 

tatistical Association. 

'he IRS/Census Direct Match Study - Final 

eport. SRD Research Report No. 

ensus/SRD/RR-84/11. Bureau of the Census, 

tatistical Research Division. Washington, 

.C. : U.S. Department of Commerce. 

etching IRS records to census records: Some 

roblems and results* Pp. 301-306 in American 

tatistical Association 1984 Proceedings of 

he Survey Research Section. Washington, 

.C.: American Statistical Association. 

ance F. 

mputation Rates for Selected Decennial Census 

opulation and Housing Items. Unpublished 

aper. National Research Council, Panel on 

ecennial Census Methodology, Washington, D.C. 

y J. 

he population of the United States in 1950 

lassified by age, $ex, and color - A revision 

f census figures. Journal of the American 

tatistical Association 50:16-54. 

y J., and Rives, Norfleet W. , Jr. 

statistical reconstruction of the black 
opulation of the United States, 1880-1970: 
stimates of true numbers by age and sex, birth 
ates, and total fertility. Population Index 
9(l):3-36. 

y J., and Zelnik, Melvin 

ew Estimates of Fertility and Population in 
he United States; A Study of Annual White 
irths from 1855 to 1960 and of Completeness 
f Enumeration in the Censuses from 1880 to 
960. Princeton, N.J.: Princeton University 
ress. 



the Census Advisory Committee on Populal 
Statistics, Washington, D.C. 

1984c Pretest of Adjustment Operations. Unpul 
paper. Washington, D.C. 

1985a The Coverage of Housing in the 1980 Cen* 
Unpublished paper. Washington, D.C. 

1985b General Description of Tasks and Where r . 
Will Be Done. Draft (rev. April 2, 198! 
Washington, D.C* 

1985c Report on 1980 Census Coverage Improvemi 
Program Evaluation. Draft. Washington! 

no Coverage Evaluation and Coverage Improve 

date (a) at the Census Bureau Since 1950. Unpub] 
paper. Washington, D.C. 

no Redistricting and the Decennial Census. 

date(b) Unpublished paper. Washington, D.C. 
Butz, William 

1984 Data Confidentiality and Public Percept; 
The Case of the European Censuses . Papc 
presented to the Population Association 
America, Minneapolis, Minnesota. 
Car luce i, Carl P. 

1980 The impact of an adjustment to the 1980 
on congressional and legislative reappor 
ment. Pp. 145-152 in Proceedings of the 
Conference on Census Under count. Bureau 
the Census. Washington, D.C.: U.S. Dep 
of Commerce. 
Carter, Grace M. , and Rolph, John E. 

1974 Empirical Bayes methods applied to estii 

fire alarm possibilities. Journal of th 

American Statistical Association 69:880- 

Cassel, Claes-Magnus , Sarndal, Carl-Erik, and Wretit 

Jan H. 

1983 Some uses of statistical models in comu 
with the nonresponse problem. Pp. 143-1 
William Madow and Ingram Olkin, eds., 
Incomplete Data in Sample Surveys, Vol . 
Proceedings of the Symposium. Panel on 
Incomplete Data, Committee on National 
Statistics, National Research Council. 
York: Academic Press. 



xiuoujLajLxcm axt.Udt.lon* J/p. JL3/ loo in 

Proceedings of the 1980 Conference on Census 
Undercount. Bureau of the Census. Washington, 
D.C.: U.S. Department of Commerce, 
idley, and Morris, Carl 

Limiting the risk of Bayes and empirical Bayes 
estimators part II: The empirical Bayes case. 
Journal of the American Statistical Association 
67:130-139. 

Stein's estimation rule and its competitors 
An empirical Bayes approach. Journal of the 
American Statistical Association 68:117-130. 
uta, Campbell, Valencia, and Freedman, Stanley 
Distributing federal funds: The use of 
statistical data (preliminary report) . 
Statistical Reporter December: 73-90. 
Eugene P. 

Affidavit Presented to District Court, 
Southern District of New York, Mario Cuomo, et 
al., Plaintiff (s), Malcolm Baldrige, et al. f 
Defendants, 80 Civ. 4550 (JES) . 
Surrebuttal Presented to District Court, 
Southern District of New York, Mario Cuomo, et 
al.. Plaintiff (s) , Malcolm Baldrige, et al., 
Defendants, 80 Civ. 4550 (JES). 
Sugene P., and Kadane, Joseph, B. 
Using administrative lists to estimate census 
omissions: An example. Pp. 361-365 in 
American Statistical Association 1983 
Proceedings of the Survey Research Methods 
Section. Washington, D.C.: American 
Statistical Association. 

Estimating the population in a census year: 
1980 and beyond. Journal of the American 
Statistical Association 80: 98-131. 
aine, Rothwell, Donny, and Mockovak, William 
Analysis of Mail-Return Rates for the 
Alternative Questionnaires Experiment. 
Preliminary Evaluation Results Memorandum No. 
16. Bureau of the Census, Washington, D.C. 

E., in, and Herriot, Roger 
Estimates of income for small places: An 
application of James-Stein procedures to 
Census data. Journal of the American 
Statistical Association 74(366):Part I 
261-277. ' 



Printing Office. 
Conk, Mar go, A. 



1981 



1984 



Cor mack , 
1981 



Accuracy, efficiency, and b 

pretation of women's work ii 

of occupations, 1890-1940. 

14(2):65-72. 

The 1980 census in historic. 

William Alonso and Paul Stai 

Political Economy of Nation* 



is: The inter 
the U.S. censi 
Eistorical Met! 

L perspective, 
, eds. i The 
L Statistics. 



York: Basic Books. 
R.M. 

Loglinear models for captur 
experiments on open populat: 
in R.W. Hiorns and D. Cooke 
Mathematical Theory of the I 



-recapture 
ms. Pp. 197-; 
eds. , The 
namics of 



Biological Populations II. few York: 

Academic Press. 
Cowan, Charles D. 

1983 Affidavit Presented to Disti 

Southern District of New Yoi 

al., Plaintiff (s) , Malcolm I ildrige, et al 

Defendants, 80 Civ. 4550 (JE ). 
Cowan, Charles D., and Bettin, Paul J. 
1982 Estimates and Missing Data I 

Post Enumeration Program. I 

Statistical Methods Divisior 

Census, Washington, D.C. 
Cowan, Charles D. , and Fay, Robert E. 



.ct Court, 
, Mario Cuomo 



oblems in the 

[published pap 

Bureau of th 



1984 



Estimates of undercount in I 
Pp. 560-565 in American Stat 



ie 1980 census 
stical Associ 



1984 Proceedings of the Sun ;y Research M 
Section. .Washington, D.C.: American 



Deming, 
1940 



Statistical Association. 
W.E., and Stephan, F.F. 

On a least squares adjustmer 
frequency table when the exi 
totals are known. Annals of 



of a sample c 
cted marginal 
Mathematical 



Statistics 11:427-444. 
Department of Commerce 

1976 Mid-Decade Census. Unpublis 
(June 15, 1976). Washington 



ed memoranda! 
D.C. 



Accounting Office 

Bureau of the Census Cost Estimates for 
Mid- Decade Census Proposals. Report to the 
Subcommittee on Census and Statistics, 
Committee on Post Office and Civil Service , 
House of Representatives* Washington, D.C.: 
U.S. Government Printing Office. 
A 34 Billion Census in 1990? Timely Decisions 
on Alternatives to 1980 Procedures Can Save 
Millions. Washington, D.C.: U.S. Government 
Printing Office. 
Leon 

Affidavit Presented to District Court, Southern 
District of New York, Mario Cuomo, et al., 
Plaintiff (s) , Malcolm Baldrige, et al., 
Defendants, 80 Civ. 4550 (JES) . 

, H. 

Estimates of Emigration from Mexico and 
Illegal Entry into the United States, 
1960-1970, by the Residual Method. 
Unpublished graduate research paper. Center 
for Population Research, Georgetown University. 

, Maria Elena 

Characteristics of formulas and data used in 
the allocation of federal funds. The American 
Statistician 34 (4) :200-211. 

, J.F. 

Reverse record check: Tracing people in 
Canada. Pp. 269-274 in American Statistical 
Association 1980 Proceedings of the Survey 
Research Methods Sect ion Washington, D.C.: 
American Statistical Association. 

i, J.F., and Theroux, G. 

RRC - Methodology Report, Part I. Internal 
report, Statistics Canada, Ottawa. 
RRC - Results on Population and Household 
Undercover age. Internal report, Statistics 
Canada, Ottawa. 

RRC - Supplementary Results on Population and 
Household Undercoverage . Internal report, 
Statistics Canada, Ottawa. 

RRC - Methodology Report, Part II. Internal 
report, Statistics Canada, Ottawa. 



uuraraerce. 
Fellegi, Ivan 

1964 Response variance and its estii ition. Jour 

of the American Statistical Ass ciation 

59:1016-1041. 
1980a Evaluation programme of the 19* 

Population and Housing - A samp 

Statistician 29 (4) :275-312. 
1980b Should the census count be adju 

allocation purposes equity cor .iderations. 

Pp. 193-203 in Proceedings of t e 198Q 

Conference on Census Under count 

the Census* Washington, D.C.s 

of Commerce* 
1984 Notes on Census Coverage Estima ion 

Methodologies* Prepared for th 



> Census of 
ing. The 

ted for 



Bureau of 
U.S. Departi 



March 1984 



meeting of the Panel on Decenni 1 Census 
Methodology, National Research i :>uncil. 
Discussion of "Estimating the p pulation in 
census year; 1980 and beyond." 
the American Statistical Associ 
Ivan, and Holt, D* 
A systematic approach to automa 

imputation. Journal of the Amei 

Statistical Association 71:17-3! > 
Fellegi, Ivan, and Sunter , Alan 

A theory for record linkage. J< irnal of thi 
American Statistical Associatior 
Edward, and McKenney, Nampeo 
Identification of the Hispanic i 
review of Census Bureau experier 
358-363 in American Statistical 



1985 



Fellegi, 
1976 



1969 

Fernandez 
1980 



Journal of 
bion 80:98-] 

ic edit and 
Lean 



64:1183-12] 



>pulation: 
es. Pp. 
Association 



1980 Proceedings of the Survey I tsearch 
Methods Sect ion * Washington, D. :.: America 
Statistical Association. 
Ferrari, Pamela w., and Bailey, Leroy 

1983 



Fienberg, 
1970 



1980 Census Telephone Follow up Experiim 
- Preliminary Assessments and In lications. 
Unpublished paper. Bureau of th Census, 
Washington, D.C. 
Stephen E. 

An iterative procedure for estim tion in 
contingency tables. The Annals < 
Mathematical Statistics 41(3) :90 -917. 



The synthetic methods Its feasibility for 
deriving the census undercount for states and 
local areas. Pp. 129-144 in Proceedings of 
the 1980 Conference on Census Undercount* 
Bureau of the Census. Washington, D.C.s U.S. 
Department of Commerce. 

rt B. , and Steffes, Robert B. 
Estimating the 1970 Census Undercount for 
State and Local Areas, Unpublished paper. 
National Urban League Data Service, Washington, 
D.C. 

.iam F. 

Data Collection in a Changing Environment. 
Paper presented at the October 1984 meeting of 
the Census Advisory Committee, Washington, D.C. 

'ard 

Preliminary Analysis of the Relation Between 
Census Mailback Rates and Gross Under cover age. 
Unpublished paper. Bureau of the Census, 
Washington, D.C. 

Exploratory Analysis of PEP Data. Unpublished 
paper. Bureau of the Census, Washington, D.C. 
Exploratory Analysis of PEP Data - Addendum to 
Interim Report. Unpublished paper. Bureau of 
the Census, Washington, D.C. 
Research Plan on Adjustment for the 1990 
Decennial Census. Unpublished paper. Bureau 
of the Census, Washington, D.C. 
Research plan on adjustment. Pp. 452-457 in 
American Statistical Association 1984 
Proceedings of the Social Statistics Section. 
Washington, D.C.s American Statistical 
Association. 

Thoughts on a Rule for When to Adjust. 
Unpublished memorandum. Bureau of the Census, 
Washington, D.C. 

ard, and Cowan, Charles 

Imputations, response errors, and matching in 
dual-system estimation. Pp. 263-268 in 
American Statistical Association 1980 
Proceedings of the Survey Research Methods 
Section. Washington, D.C.: American 
Statistical Association. 



1984 Geographical Mobility s March 1982 to Ma 
1983,. Current Population Reports, Popu 
Characteristics P-20 , No. 393. Bureau < 
Census. Washington, B.C.: U.S. Depart 
Commerce. 
Hansen, M. , Hurwitz, W., and Bershad, M. 

1971 Measurement errors in censuses and surv< 
Bulletin of the International Statistic 
Institute 38:359-374. 
Harville, David A. 

1976 Extension of the Gauss-Markov theorem t 
include the estimation of random effect: 
Annals of Statistics 4:384-395. 
Henderson, Charles R. 

1975 Best linear unbiased estimation and pre 
under a selection model. Biometrics 
31:423-447. 
Her riot, Roger A. 

1984 Final Report of 1990 Requirements Plann 
Committee. Unpublished paper. Bureau < 
Census, Washington, D.C. 
Herriot, Roger, A., and Speaker, Robert C. 

1984 Residence rules for the 1990 decennial 
census. Pp. 449-451 in American Statisl 
Association 1984 Proceedings of the Sex; 
Statistics Section. Washington, D.C.s 
American Statistical Association. 
Higginbotham, J.R. , and Cox, K.K. 



1979 

Hill, J. 
1911 



Focus Group Interview; A Reader. 
American Marketing Association. 



Chici 



Letter to William C. Huston, Chairman, : 
Committee on the Census, April 25, 1911 
U.S. Congress, House Apportionment of 
Representatives. H.R. 12. 62nd Congre 
Session. 
Hill, Kenneth 

1985 Indirect approaches to assessing stocks 
flows of migrants. Pp. 205-224 in App. 
Daniel B. Levine, Kenneth Hill, and Rob 
Warren, eds., Immigration Statistics, A 
of Neglect. Panel on Immigration Stati 



Pittsburgh, Pa.s Department of Statistics, 

Harnegie-Mellon University, 

eph B. f and Brick sen , Eugene 

Revised estimates of state and central city 

populations on Census Day, 1980. Pp. 208-210 

In American Statistical Association 1984 

Proceedings of the Social Statistics Section, 

Washington, D.C.s American Statistical 

Association. 

*ph B., and Lehoczky, J. 

landom juror selection from multiple lists. 

Operations Research 24:207-219. 

ien, and Smith, Robert 

Preliminary Results of the 1980 Content 

^interview Study. Preliminary Evaluation 

lesults Memorandum No. 67. Bureau of the 

Census, Washington, D.C* 

>atrick 

Jotes on the Theory and Practice of Exact 

latching. Unpublished paper. Bureau of the 

Census, Washington, D.C. 

Hocking considerations for record linkage 

mder conditions of uncertainty. Pp. 602-605 

n American Statistical Association 1984 

^oceedings of the Social Statistics Section. 

Washington, D.C.: American Statistical 

Association. 

:han 

:nformation and allocation: Two uses of the 

.980 census. The American Statistician 

13(2):45-50. 

facing the fact of census incompleteness, pp. 

7-36 in Proceedings of the 1980 Conference on 

Census Under count. Bureau of the Census. 

Washington, D.C.: U.S. Department of Commerce. 

i 

Jamples and censuses. International 

Itatistical Review 47 (2) :99-109. 

mder 

:yclical fluctuations in the difference 

Between the payroll and household measures of 

smployment. Survey of Current Business 

9(5):14-44 and 55. 



Unpublished manuscript. Statistical Research 
Division , Bureau of the Census, Washington, 
D.C. 
Huang, E.T., and Fuller, W.A. 

1978 Nonnegative regression estimation for sample 
survey data. Pp. 300-303 in American 
Statistical Association 1978 Proceedings of 
the Social Statistics Section. Wash ing ton , 
D.C.s American Statistical Association. 
Huntington, E.V. 

1921 The mathematical theory of the apportionment 
of representatives. Proceedings of the 
National Academy of Sciences USA 7:123-127. 
Ireland, C.T., and Kullback, S. 

1968 Minimum discrimination information estimation, 

Biometrics 24:707-713. 
James, w. , and Stein, C. 

1961 Estimation with quadratic loss. Pp. 361-379 
in Proceedings of the Fourth Berkeley 
Symposium on Mathematical Statistics and 
Probability. Berkeley: University of 
California Press. 
Jaro, Matthew 

1984a Record Linkage Research Plan. Unpublished 

paper. Bureau of the Census, Washington, D.C. 
1984b Record linkage research and the calibration of 
record linkage algorithms. Pp. 599-601 in 
American Statistical Association 1984 
Proceedings of the Social Statistics Section. 
Washington, D.C.: American Statistical 
Association . 
1985 Current Record Linkage Research. Paper 

presented at the April 1985 meeting of the 
Census Advisory Committee, Washington, D.C. 
Johnson, Bruce E. 

1984 Interim Census Manager Reports on 1986 Pretest 
Objectives. Unpublished paper. Bureau of the 
Census, Washington, D.C* 
Kackar, Raghu, and Harville, David A. 

1984 Approximations for standard errors of 

estimators of fixed and random effects in 
mixed linear models. Journal of the American 
Statistical Association 79:853-862. 



17(3) 2251-284* 

.kelson , Gordon 

1984 Results of the Enumerator-Supplied Roster 

Portion of the Update List/Leave Procedures 
Evaluation . Preliminary Evaluation Results 
Memorandum No. 76. Bureau of the Census , 
Washington , D.C. 

kelson , Gordon, and MeKelvey, Karen 

1983 Preliminary Results from Administrative 

Records for the Update List/Leave Experiment 
Program Procedures Evaluation. Preliminary 
Evaluation Results Memorandum No. 66. Bureau 
of the Census, Washington, D.C. 

kura, Susan M. , and Thompson, John H. 

1983 1980 census findings and their implications 
for 1990 census planning. Pp. 353-360 in 
American Statistical Association 1983 
"proceedings of the Social Statistics Section. 
Washington, D.C. 2 American Statistical 
Association. 

kura, Susan M. , Woltman, Henry, and Thompson, John 

1984 Uses of sampling for the census count. Pp. 
458-463 in American Statistical Association 
1984 Proceedings of the Social Statistics 
Section. Washington, D.C. 2 American 
Statistical Association. 

kovak, William 

1982a Analysis of Item Nonresponse in the Alternative 
Questionnaires Experiment. Preliminary 
Evaluation Results Memorandum No. 19. Bureau 
of the Census, Washington, D.C. 

1982b Analysis of the Effect of Questionnaire Length 
on Item Nonresponse. Preliminary Evaluation 
Results Memorandum No. 25. Bureau of the 
Census, Washington, D.C. 

L983 Comparison of Data Obtained Using Alternative 

Questionnaires in the 1980 Census. Preliminary 
Evaluation Results Memorandum No. 56. Bureau 
of the Census, Washington, D.C. 

ional Center for Health Statistics 

L982a Vital Statistics of the United States . 1978. 
Vol. II, Mortality. Part A, Section 6, 
Technical Appendix. Washington, D.C.: U.S. 
Department of Health and Human Services. 



i proceedings of the Social Stat 
Section . Part 2. Washington, D,C. 
Statistical Association. 
Lesko and Associates 

1975 Final Report; Basic Data and Guida 
Required to Implement a Major Illeg 
Study During Fiscal Year 1976, Was 
D.C.: U.S. Department of Justice. 
Levine, D.B., Hill, K., and Warren, R., eds. 
1985 Immigration Statistics, A Story of : 
Panel on Immigration Statistics, Co 
National Statistics, National Resea 
Washington, D.C.: National Academy 
Liebow, E. 

1967 Tally's Corner, A Study of Negro St 
Men. Bostons Little, Brown and Co 
Lindley, D.V. , and Smith, A. P.M. 

1972 Bayes estimates for the linear mode 
of the Royal Statistical Society 
34:1-19. 
Marks, Eli S., and Waksberg, Joseph 

1966 Evaluation of coverage in the 1960 
population through case-by-case che 
62-70 in the American Statistical A 
1966 Proceedings of the Social Stat 
Section. Washington, D.C.: Americ 
Statistical Association. 

Marks, Eli S., Seltzer, William, and Krotki, K 
1977 Population Growth Estimation; A Ha 
Vital Statistics Measurement. New 
Population Council. 
Marks, Jennifer 

1980 Census Bureau data sources on immig 
Pp. 15-20 in American Statistical A 
1980 Proceedings of the Social Stat 
Section. Washington, D.C.: Americ 
Statistical Association. 
Matchett, Stanley D. 

1984 Major Objectives and Priorities for 
Pretest. Unpublished paper. Burea 
Census, Washington, D.C. 



uemogtapnic analysis s A report on worn in 
progress . Pp. 160-165 in American Statistical 
Association 1984 Proceedings of the Section on 
Social Statistics. Washington, D.C.s 
American Statistical Association* 
sffrey So, Siegel, Jacob S., and Robinson, J. 

Coverage of the National Population in the 
1980 Census , by Age, Sex, and Race; 
Preliminary Estimates by Demographic Analysis, 
Current Population Reports, Special Studies 
P-23, No. 115. Bureau of the Census. 
Washington, D.C.j U.S. Department of Commerce. 

ffrey S., and Woodrow, Karen 
The judicial basis for enumeration of 
undocumented aliens in the 1980 census and 
implications for 1990. Pp. 464-469 in 
American Statistical Association 1984 
Proceedings of the Social Statistics Section. 
Washington, D.C.s American Statistical 
Association. 

.H., Hines, J.E., and Nichols, J.D. 
The use of auxiliary variables in capture- 
recapture and removal experiments. Biometrics 
40(2):329-340. 

iel 0. 

A check on underenumeration in the 1940 census. 
American Sociological Review 12(1) :44-49. 

hilip 

A Study of the Future of the Census of 
Population Alternative Approaches . 
Unpublished paper commissioned by the 
Statistical Office of the European Communities. 

tha Farnsworth 

1984 demographic services directory. American 
Demographics 6 (1) : 32-41. 
The state of the states' data centers. 
American Demographics 6(10):28-31, 36-37. 

J.G. 

Estimating the approximate size of the illegal 
alient population in the United States by the 
comparative trend analysis of age-specific 
death rates. Demography 17 (2) : 15 9-17 6. 



1972 America's Uncounted People. Advisory Cc 
on Problems of Census Enumeration, was* 
D.C.: National Academy of Sciences. 

1978 Counting the People in 1980s An Apprais 
Census Plans. Panel on Decennial Census 
Committee on National Statistics , Natior 
Research Council. Washington, D.C.: N< 
Academy of Sciences. 

1984 Planning the 1990 Census; Priorities fc 
Research and Test ing , Interim Report. \ 
on Decennial Census Methodology, Committ 
National Statistics, National Research ( 
Washington, D.C.: National Academy Pres 

Nichols, Roy 

1948 The Disruption of American Democracy, i 

York: Macmillan. 
Nisselson, H. 

1974 Request for Policy Guidance on Particip< 
Observer Studies. Unpublished memorandu 
Bureau of the Census, Washington, D.C. 
Office of Management and Budget 

1985 Major Themes and Additional Budget Detaj 
1983. Washington, D.C.: U.S. Governmer 
Printing Office. 

Oh, H. Lock, and Scheuren, Fritz 

1978 Some unresolved application issues in ra 
ratio estimation. Pp. 723-728 in Americ 
Statistical Association 1978 Proceedings 
the Section on Survey Research Methods. 
Washington, D.C.: American Statistical 
Association. 

Palloni, Alberto 

1985 The Population of Illegal Aliens: Data 
Sources, Methods, and Estimates. Unpubl 
paper. Center for Demography and Ecoloc 
University of Wisconsin. 

Passel, Jeffrey S. 

1983 Affidavit Presented to District Court, 
Southern District of New York, Mario Cue 
al., Plaintiff (s) , Malcolm Baldrige, et 
Defendants, 80 Civ. 4550 (JES) . 



Studies Center, Univers^ 
-2 Sekar, C.C., and Deming, W.E. 
Dn 1949 On a method of estimati* 

rates and the extent of 

of the American Statistj 

44:101-115. 
: * ( Siegel, Jacob S. 
:h 1970 Coverage of population 1 



Preliminary findings 
64-69 in American Stati 
A Proceedings of the Socig 

Washington, D.C.: Ameri 
Association. 

!I 1974 Estimates of the covera^ 
u sex, race, and age in tft 

l< Demography 11(1)1-23. 

1975 Coverage of Population i^ 

Some Implications for PU 

ll e Population Reports, Spec 

'h ( 56. Bureau of the CensU 

1 U.S. Department of Comme 

Siegel, J.S., and Passel, J.S. 

1979 Coverage of the Hispanic. 
United States in the 197 
Methodological Analysis- 
Reports, Special Studies 
^ of the Census. Washingt 

cs Department of Commerce. 

or Siegel, Jacob S., Passel, Jeffrey S 
ia ! Jr., and Robinson, J. Gregory 
[ 1977 Developmental Estimates 
iri Population of States in 

Demographic Analysis. Ci 
Reports, Special Studies 
nc of the Census. Washingt< 

it Department of Commerce. 

Siegel, Jacob S., and Zelnik, Melvii 
of 1966 An evaluation of coverage 
of population by techniqi 
analysis and by composite 
in American Statistical * 
Proceedings of the Sociaj 
Washington, D.C.: Amerj 
Association . 



Statistical Association 1979 Proce 
the Social Statistics Section, wa 
D.C.: American Statistical Associ 

Kuthwell, Naomi D 

1983 New Ways of Learning How to Improv 
Enumerative Questionnaires: A Dem 
Project. Unpublished paper. Bure 

Census, Washington, DC, 

Hub in, D.B. 

147H Multiple imputations in sample sur 
phenome no logical Bayesian approach 
nonresponse (with discussion and r 
1-9 In Insgutation_ and Editing of E 
iH!JLS^^ Washington, 

Social Security Administration and 
the Census. 

,,,v.,', '. Richard, and Windham, Bernard M. 
i i ' rh* Importance of Bias Removal in 

nf United States Census Counts. I 
papir. Department of Statistics, 
:tat<* University. 

, ,iv \ o* * I tnmard *I . 

* l *'*4 f ^j f lJtHH^ N( 

w i I **y . 

' *! t t Mtii v in G. 

^IHU Tin us** of focus group interviews 
tho design of an administrative f( 
r.tudy at" the Social Security Adrai 
1'. 147-352 in American Statistic 

Washington, D. 

nlatistical Association. 
Marvin G., and Nelson, William J. f J 
collection and analysis of data o 
ethnicity questions in social sec 
applications. Pp. 353-357 in Me 
c^^^ 1980 Proc 

ij^f^f^ Research Me 

Washington, D.C.: American Stati 
Association. 



jerenaants, ou 

)iscussion of "Estimating the population in a 

census year: 1980 and beyond." Journal of 

:he American Statistical Association 80s98-131. 

lerick Jackson 

Dhe Significance of the Frontier in American 

li story. Proceedings of the 41st meeting of 

:he State Historical Society. Madison, Wis.: 

State Historical Society. 

shall L. , Jr. 

L980 Census Mail Response Rates. 1990 

Jecennial Census Informational Memorandum No. 

15. Bureau of the Census, Washington, D.C. 

.980 Census Mail Response Rates. 1990 

)ecennial Census Informational Memorandum No. 

:5 f Revision 1. Bureau of the Census, 

Washington, D.C. 

>f Representatives 

Impact of Budget Cuts on Federal Statistical 

'rograms. Hearing before the Subcommittee on 

ensus and Population of the Committee on Post 

iffice and Civil Service. 97th Congress, 2nd 

ession (March 16, 1982), Serial No. 97-41. 

Washington, D.C.: U.S. Government Printing 

ff ice. 

:harles, and Valentine, Betty Lou 

issing Men, A Comparative Methodological 

tudy of Under enumeration and Related Problems. 

npublished paper. Bureau of the Census, 

ash ing ton, D.C. 

seph 

nalysis of Synthetic Estimates. Unpublished 

emorandum. Bureau of the Census, Washington, 

.C. 

ean Square Error of Revisions in Population 

ount from Vacancy Check. Unpublished 

emorandum. Bureau of the Census, Washington, 

.C. 

seph, Hanson, Robert, and Bounpane, Peter 

stimation and Presentation of Sampling Errors 

or Sample Data from the 1970 Census. Paper 

resented at the 39th session of the 

nternational Statistical Institute, Vienna, 

astria. 



1980a 



1980b 



StarsiniCj 
1983 



t.ne social btatistics Section. Wasningtor 
D.C. s American Statistical Association. 
Slavson, J.R. 

1979 Dynamics of Group Psychotherapy, New York 

Jason Aronson, Inc. 
Spencer , Bruce 

Benefit-Cost Analysis of Data Used to Allc ate 
Funds, Lecture Notes in Statistics 3, N 
York s Spr inger-Verlag . 

Implications of equity and accuracy for ur er- 
count adjustment: A decision-theoretic 
approach. Pp. 204-216 in Proceedings of t e 
1980 Conference on Census Undercount. Bur au 
of the Census. Washington, D.C.: U.S. 
Department of Commerce. 
Donald E. 

Evaluation of Population Estimation Procec .res 
for States , 1980s An Interim Report. GUI ent 
Population Reports, Population Estimates nd 
Projections P-25, No. 933, Bureau of the 
Census. Washington, D.C.: U.S. Departmer of 
Commerce . 

Strauss, Robert P., and Hark ins, Peter B. 

1974 The 1970 Census Undercount and Revenue She ing 
Effects on Allocations in New Jersey and 
Virginia. Unpublished paper. Joint Centc 
for Political Studies, Washington, D.C. 

Tippett, J., and Takei, R. 

1983 Evaluation of Reporting of Utility Costs J >r 
Selected Cities. Preliminary Evaluation 
Results Memorandum No. 59. Bureau of the 
Census, Washington, D.C. 

Thompson, John 

1984 Preliminary Summary Results from the 1980 
Census Coverage Improvement Program 
Evaluations. Preliminary Evaluation Resa :s 
Memorandum No. 85. Bureau of the Census, 
Washington, D.C. 

Tukey, John W. 

1981 Discussion of Paper by Bailar and Keyfitz 

Paper presented to the American Statistic. L 
Association, Detroit, Michigan. 



(Chair) is a professor at the Harvard 
aduate School of Business. He received an 
iceton in 1952 and a Ph.D. in statistical 
bhods from Stanford in 1956. He is a former 
le Social Science Research Council and is a 

American Statistical Association, the 
>ciety, and AAAS. He is interested 
Ln statistical theory and methods and is a 
>cial Experimentation; A Method for 
Svaluating Social Intervention. He is a 
Committee on National Statistics and 
>mmittee on National Statistics' study group 
;al monitoring. 

JAN CAFFERTY is a professor of social 
.stration at the University of Chicago. She 
i. from St. Bernard College in 1962 and an 
D. in American literary and cultural 
'1 from George Washington University. Her 
des the effects of immigration on urban 
of census statistics. 

ITRO is an ASA/Census research fellow at 
LU of the Census and served as study 
his study. She received a B.A. from the 
Rochester in 1963 and an M.A. and a Ph.D. 
cience from Yale University in 1964 and 
vely. She is a former vice president and 
r of Mathematica Policy Research, Inc., and 
tor of the Information Documentation Center 
er research has included many projects to 
fulness and accessibility of large, complex 
s as well as analysis related to income 
d demographic change. 

385 



America t Pittsburgh, Pa. 
Warren, Robert, and Peck* Jennifer Marks 

1980 Foreign-born emigration from the U: 
1960-1970. Demography 17(l)s71-84 
Wolter, Kirk M. 

1983 Affidavit Presented to District Co 
District of New York, Mario Cuomo, 
Plaintiff (s) , Malcolm Baldrige, et 
Defendants, 80 Civ. 4550 (JES) . 

1984 Requisite planning and research r 
decision on census adjustment for 
presented at the October 1984 meet 
Census Advisory Committee, Washinc 

Yuskavage, Robert, Hirschberg, David, and Scl 

Frederick J. 

1977 The impact on personal and family 
adjusting the Current Population I 
under cover age. Pp. 70-80 in Amer 
Statistical Association 1977 Proc< 
the Social Statistics Section. W; 
D.C.s American Statistical Assoc 



.ng, and econometrics. He is a member of the 
National Statistics and was a member of its 
:istics for rural development. 

)ANE is a professor of statistics and social 
Carnegie-Mellon University. He received a 
from Harvard University and a Ph.D. in 
:heory and methods in 1966 from Stanford 
He is the editor of Robustness of Bayesian 
is the applications and coordinating editor 
il of the American Statistical Association. 
>er of the American Statistical Association's 
lei on the census undercount and is 
xticularly in statistical inference in the 
:es and computational complexity. 

TNG is the director of survey methods at the 
testing Service in Princeton, New Jersey. He 
..B. in 1958, an M.B.A. in 1960, and a Ph.D. 
:hodology in 1964 from the University of 
is a former professor of quantitative 
ie School of Business Administration and a 
sor in the Department of Statistics at the 
: Washington. His research interests include 
s, logic, applications of statistics in 
economics, and policy research. 

KY is associate dean for Ph.D. studies and 
business administration in the Graduate 
iness of the University of Chicago. He 
A. in 1952, an M.S. in 1955, and a Ph.D. in 

statistics in 1958 from the University of 
has been a research mathematician at the 
ion, senior vice president of a large 
gency, president of a computer software and 
ng firm, professor and chairman of computer 
ity College of New York, and a fellow of the 
vanced Study in the Behavioral Sciences. He 

of Foundations of Econometrics and has done 
ultivariate statistical analysis. 

>NI is an associate professor in the 
Sociology and a research associate at the 



analysis, regression, and sample design and estimatl n 

ANSLEY J. COALE is a professor of demography at Prin< *-. 
University. He received a B.A. in 1939, an M.S. i n $+< 
and a Ph.D. in demography in 1947 from Princeton. H ii 
a former chairman of the Committee on Population and 
Demography of the National Research Council. He is t te 
coauthor of New Estimates of Population and Births ii ^ 
United States, Regional Model Life Tables and Stable ~ 
Population, and The Growth and Structure of Human 
Populations . 

DONALD R. DESKINS, JR., is professor of urban geograj ,y 
and sociology and associate dean of the Horace H. Rac fea 
School of Graduate Studies at the University of Michi ar 
He received a B.A. in 1960, an M.A. in 1963, and a Ph D. 
in geography in 1970 from the University of Michigan, s 
serves as a contributing editor to Urbanism Past and 
Present and is a member of numerous professional 
societies. His most recent research is focused on th 
analysis of academic degree production in the United 
States and its public policy implications. 

IVAN P. FELLEGI is the deputy chief statistician of 
Statistics Canada. He received a B.Sc. from the 
University of Budapest in 1956 and an M.Sc. in 1958 a 2 
Ph.D. in survey methodology in 1961 from Carle ton 
University. He is a former director of the Sampling . id 
Survey Research Staff and a former director general o: 
the Methodology and Systems Branch of Statistics Cana< * 
He is current president of the International Associat: in 
of Survey Statisticians and president-elect of the 
International Statistical Institute. He has publishe* 
extensively in the areas of census and survey methodo: >gp 
and was a member of the Committe on National Statistic 
panel on privacy and confidentiality as factors in sui ner 
response. 

WAYNE A-. FULLER is a distinguished professor in the 
Department of Statistics at Iowa State University. H* 
received a B.S. in 1955, an M.S. in 1957, and a Ph.D. a 
statistical theory and methods in 1959 from Iowa Stat 



Index 



:ess lists 

necking programs/ 222? see 

also Advance Post Office 

Check; Casing and Time of 

Delivery Checks? 

Post-Enumeration Post 

Office Check; Precanvass; 

Prelist; Update 

list/leave enumeration 

procedure 
ivelopment of, for 1980 

census, 88-91 
istment of census figures, 

29-33, 275-318 
id carrying down to small 

areas, 294, 303-307, 312 
id combining estimates 

from different programs, 

300-301 

id error estimates, 289-292 
ictors affecting, 30-31 
ist matching algorithms 

in, 329 
id fund allocations, 66, 

67, 68-70 

Lerarchical Bayesian 
method in, 302-303, 

315-317 
id impact of census 

errors, 53 

iputation in, 294, 305-306 
id internal consistency of 
census figures, 31, 

292-295 



and iterative proportional 

fitting, 22, 95, 118-119, 

304-305 
and loss functions, 

278-288, 313-314 
and modification of 

estimates within one 

program, 301-303 
need for, 17, 275-278 
procedures for, 298-307, 312 
protection against extreme 

adjustments, 288-289 
recommendations for, 14, 

15, 32-33, 297-298, 

310-312 
regression techniques in, 

302, 303, 306-307, 311 
research and testing of 

plans for, 19, 104, 106, 

307-312 
synthetic estimation in, 

119, 303-304, 318 
time constraints in, 31-32, 

296-298, 346 

and use of domains, 301-302 
weighting in, 294 
Administrative records, 10, 

15, 151-157, 166, 319 
for content collection and 

improvement, 262-265 
from driver's license 

records, 10, 133 
effectiveness of, 222 
for hard-to-count groups, 

323-324 
for housing items, 19, 

265-267 



associate professor in the Department of Sociology an< 
research associate in the Population Research Center t 
the University of Texas. He has published in the fie. is 
of population studies and demography. 

JOHN E. ROLPH is a senior statist ican and associate h< id 
of the Economics Department of the Rand Corporation a 
well as a faculty member of the Rand/UCLA Health Poli< f 
Studies Center. He received an A.B. in 1962 and a Ph D, 
in statistics in 1966 from the University of Californ a, 
Berkeley. He has taught at the University of London, 
Columbia University, the Rand Graduate Institute, and the 
University of California, Los Angeles. He is the coa the 
of Introduction to Data Analysis and Statistical Infe snc 
and, among other areas, is interested in empirical Ba es 
methods, sequential decision problems, actuarial meth dsi 
and jury representativeness. 

COURTENAY M. SLATER is the president of CEC Associate , 
Washington, D.C. She received a B.A. from Oberlin 
College in 1955, and an M.A. and a Ph.D. in economics 
from American University in 1965 and 1968, respective .y. 
She is a former staff economist with the Council of 
Economic Advisers and a former chief economist of th 
Department of Commerce. She is a member of the Commi .tec 
on National Statistics and has been involved in the 
planning of decennial censuses. 

JOSEPH WAKSBERG is the vice president and director 01 the 
statistical staff at Westat, Inc. He received a B.S, in 
mathematics from City College of New York in 1936. ] * is 
a former associate director for methodology and rese< 
at the U.S. Bureau of the Census. His fields of int< 
include survey methods, sampling theory, and sampling 
practice. He is the author of numerous publications in 
sample design for surveys and survey methodology. 



jram for 1990, 
>03 

>sed persons 
Ln 1970 
), race and 
questions in, 

), race and 
questions in, 



analysis in, 135 
micity 

in, 206 

: content in, 258 
), 98 
aluation in, 

analysis in, 
srage in, 187, 

bions in, 240 
imeration 
L25-127, 186, 

micity 

in, 206 
: content in, 258 

in, 177, 184, 

232-236 
), 98-99 
aluation in, 

analysis in, 

*rage in, 240-241 

Current 

i Survey, 232 

:ions in, 239-240 

jmeration 

L27-128, 186, 

micity 
in, 206 

c studies in, 128 
3rd check in, 



coverage evaluation in, 

132-133 
coverage improvement 

programs in, 187-194, 214 

cost and effectiveness 

of, 192-194, 214 
demographic analysis in, 

138-139 

gross omissions in, 236 
gross overenumerations in, 

239 
housing coverage in, 187, 

240-242 
matched to Current 

Population Survey, 

132-133, 236 
race and ethnicity . 

questions in, 206 
record check studies in, 133 
sampling for content in, 258 
undercounts in, 177, 

185-186, 236 
Census of 1980 

address list development 

in, 88-91 
coverage improvement 

programs in, 93-95, 

195-202, 214, 216 

cost and effectiveness 

of, 200-202, 214 
data processing in, 95-97 
demographic analysis in, 

97, 148-149 
enumeration in, 91-92 
follow-up in, 92-93, 254-255 
and formation of 12 

estimates of 

undercoverage, 145-148 
hard-to-count groups in, 

177-184 
housing coverage in, 240, 

242 
matched to Current 

Population Survey, 123, 

141-147, 224, 337 
matching with Internal 

Revenue Service records, 

183-184, 229-231 



4 " " - *' " hierarchical met! 3, 

in adjustments, 302-3 , 

; " f '" ' 315-317 

8nchiwrking, and corabini i 

information from 

F different programs, 3 -. 

and trror estimates, 2< 

* * 5 f ' f Correlation bi sj 

; * : - Independence, assuropi on 

"**"* of i Dual-system estir ti 

< Birth rtgistration recor , 

( -,-''. ui t 137r I38r 149f ,i 

1 of, 163-1 i 

* " 4 ' population? see Ra \, 

errors in a i 

c - * r * of? Race a 1 

r[ " ** tthnicity questions 

c : *' and aatching 

172-173, 3^ 

^ , . planning uses 

V .,,-'>'. n of counts for si 11 

C t ^ 44 

f ' , of Chirac ttris tics da i 

S , : -- 47 

E .,,, ,4. 

D * s J 

, :.. -:,' 



J ' ( 

' , ^ , , r r ptoctdtttes in H 

s ' rtcord check, ii 

m 123, DM32, 182, 

a ' " , . U9 

a in, 61 

1 ' , r . 334 

p ' : " . . - , Cipturt-ctcipturt 

a- * " ' H ! * 

' r . titiiatlors 

' :: A ; ' " and Tin of De if 

* ^ '* ' " rs ' m 

* f - 112, 198 

f In of 19SO f 

ilS 



393 



See also Adjustment of 

census figures; Small 

area data and estimates 
Coverage 

errors in, 120-121; see 

also Errors 
evaluation of, 120-175; see 

also Evaluation programs 
improvement measures for; 

see Improvement programs 
and need for mid-decade 

census, 16, 20, 51 
review of problems in, 13-17 
and role of estimation, 

10-11 

in sample census, 249 
See also Overcount; 

Undercount 
Coverage Questions and 

Dependent Roster Check, 

93, 200, 203, 214-216 
See also Report of Living 

Quarters Check 
Criticisms of censuses, 6 
Cross-product ratio, 168 
See also Correlation bias; 

Dual-System estimation; 

Independence, assumption 

of 
Cross-tabulation of data, and 

confidentiality 

considerations, 46, 49 
Curbstoning, 121, 143 
Current Population Survey, 42 
matching to 3.960 census, 232 
matching to 1970 census, 

132-133, 236 
matching to 1980 census, 

123, 141-147, 224, 337 
use for 1990 census, 347 



Data processing, for census 

of 1980, 95-97 
See also Iterative 
proportional fitting; 
Imputation 
Death registration records, 

149-150 
completeness of, 164 



Demographic analysis, 34, 35, 

122, 135-139 

in census of 1950, 135-136 
in census of 1960, 136-137 
in census of 1970, 138-139 
in census of 1980, 97, 

148-149 

components of, 163-164 
and estimates of net 

undercount, 178-180, 

184-186 
plans for 1990 census, 

330-333 
of population over age 65, 

150 
of population under age 65, 

149-150 

prior to 1950, 135 
problem areas in, 150-151, 

320 
See also Immigrants; 

Medicare data 
Denmark, census procedures 

in, 102-103 
Differential coverage errors; 

see Errors; Overcount; 

Undercount 
Direct estimate coverage 

estimation programs, 

160-163 
Driver's license records; see 

Administrative records 
Dual-system estimation, 

140-147, 162-163, 

167-168, 337, 342 
and inclusion 

probabilities, 140-141, 

170, 345 

stratification in, 141, 344 
See also Correlation bias; 

Independence, assumption 

of 
Duplications in the census; 

see Overcount 



E Sample; see Evaluation 

programs 
Education programs, fund 

allocations for, 60 



392 



methodology in, 88-97 
over enumerations in, 178 , 

238-239 
and Post-Enumeration 

Program, 97, 123, 

139-148; see also 

Post-Enumeration Program 

in census of 1980 
and protest by New York 

City, 124, 154-157, 169 
race and ethnicity 

questions in, 206, 207-208 
recommendation for review 

of data in 1990 census 

planning, 21-22, 114 
sampling for content in, 258 
undercount in, 177, 

179-187, 224-231 
Census of 1990 

administrative records in, 

203 
automation technologies in, 

347-348 
demographic analysis in, 

330-333 
evaluation programs for , 

319-360 
and post-enumeration 

program, 335-349 
and pre-enumeration survey, 

346-347 
reverse record check in, 

333-335 

time factors in, 346-348 
and variance estimation in 

matching process, 348-349 
Central cities, undercounts 

in; see Urban areas 
Characteristics data 

for comparative studies, 

46-48 

restrictions on, 48-49 
for small areas and 

subgroups, 46-49 
Compact area cluster samples, 

in post-enumeration 

surveys, 326-327, 343 
Comparative studies 

analysis of data across 

time periods, 50-51 
characteristics data in, 

46-48 



Composite list format: 

123-124, 152-154, 
in New York City cas 

155-156, 169 

Comprehensive Employme 

Training Act, fund 

allocations for, 6 

Computer use; see Auto 

technologies 
Congressional represen 
and need for censu 
see Reapportionmen 
Consistency of data, 
adjustments affect 
31, 292-295 
Content items 

accuracy improvement 
administrative rec 
262-265 

decisions for inclus 
48-49, 259, 260-26 
errors in, 121-122, 
problems with, 18-19 
sampling for, 257-26 
Conventional enumerati 
92-94, 99, 107, 22 
242 

See also Post-Enumer 
Post Office Check 
Cooperation with censu 

programs, 188 
pre-enumeration surv 

affecting, 347 
Correlation bias, 140, 
See also Dual-systen 
estimation; Indepe 
assumption of 
Costs 

of census of 1980, E 

of coverage evaluati 

of coverage improven 

programs, 23, 192, 

201, 222 

and efficacy of proc 
programs for reduct] 

10 
sampling affecting, 

248, 268-269 
of systematic obsen 

360 

County data and estinu 
errors in, 71-72 



395 



and balancing of 

undercount with 

overcount, 341-343 
demographic analysis in, 

330-333 
and improvements in 

Post-Enumeration 

Program, 338-349 
and inclusion 

probabilities for two 

lists, 343-346 
level of nonresponses in, 

338-339 
percentage of unresolved 

matches in, 339-341 
and position paper by 

Wolter, 308, 325-330 
pretest plans in, 321-330 
reverse record check in, 

333-335 
resident observation in, 

350-351 
systematic observation 

in, 351-354 

time factors in, 346-348 
and variance estimation 

in matching process, 

348-349 
record check studies in; 

see Administrative 

records 

research on/ 19 
reverse record checks 

in; see Reverse record 

checks 
timing considerations in, 

157-158, 346-348 



Follow-up procedures, 107 
in census of 1980, 92-93 
sampling in, 26, 250-255, 

341 
telephone use in, 26, 27 , 

254-255 
Forward Trace Study, 35, 161, 

324-325, 335 
See also Reverse record 

checks 

Fractional matching, 340 
France, census procedures in, 

101 
Fund allocations 

and adjustments for income 

and population, 68 
characteristics data in, 47 
for education, 60 
for employment and 

training, 61 
errors in count affecting, 

45, 59-70 
formula-based programs in, 

65-66 

for housing, 62 
and income reporting 

errors, 68, 70 
loss functions affecting , 

279, 285-286 
and need for census data, 

7, 16, 39-40, 43, 76-77, 

78, 79 

for public assistance, 63 
for recreation, 63 
for revenue sharing, 61, 

67-69 

for social services, 63 
for transportation, 64 



Focus group method 
for development of 

questions, 211-212 
for development of coverage 
improvement procedures , 
223 
Follow-on survey considered 

for 1990 census, 259-261 
one-form census with, 
261-262 



Geocoding, 161-162 

avoidance of, 340 
Geographic regions 

gross omissions in, 236 
and housing coverage 

omissions, 241 
Germany, census procedures 
in, 101 



394 



Educational status, and gross 

omission rates, 233-234 
Emigration, estimates of, 

134, 150, 164, 331, 332 
Employment 

fund allocations for, 61 
equal opportunity programs 

for, and state uses of 

census data, 77 
Employment status, and gross 

omission rates, 234 
Enumeration process, in 

census of 1980, 91-92 
Errors 

and adjustment of counts, 

29-33, 53, 276 
in census of 1980, 9-10 
in content, 121 
in county population 

estimates for 1980, 71-72 
in coverage, 120-121; see 

also Overcount; Undercount 
criticisms of, 6 
estimations of, 289-292 
and evaluation of coverage, 

120-175; see also 

Evaluation programs 
factors in, 23 
in hard-to-count groups, 

24-25 

impact of, 51-70 
in income reporting, 264 
and profiles for coverage 

evaluation methods, 157, 

158-165 

in sampling, 245 
in subcounty population 

estimates for 1980, 72, 

73, 74 
Estimation 

of census errors, 289-292 
compared with coverage 

improvement, 10-11 
dual-system; see Dual- 
system estimation 
errors affecting, 70-74 
multiple list methods in, 

152-154, 166-170 
synthetic, 119, 303-304, 318 
triple-system, 344-345 
Ethnicity; see Race and 

ethnicity questions 



Evaluation progn 

120-175 
administrative 

in, 151-157, 
in census of 1! 
in census of 1< 
in census of 1! 
composite list 

123-124, 152- 
cost of, 158 
criteria for, : 
Current Populal 

samples in; i 

Population S\ 
demographic ani 

135-139, 320 
direct estimate 
E sample in, 1; 

337 
error profile : 

in, 157, 158' 
and formation < 

estimates of 

undercoverag< 

145-148 
and independen 

for populate 

134-135 
macro-level me 

122 

prior to 198 
megalist metho< 
methods of, 12 
micro-level me 1 

122-133 
multilist meth 
multiple list 

methods, 152 
need for, 15 
P sample in, 1 
and Post-Enume 

Program for < 

1980, 97, 12 
post-enumerati< 

in, 33-35, 1 

Post-enumera 
prior to 1980, 

133-139 
recommendation 

census, 33-3 

assessment o 
319-321 



397 



for household roster, 

214-216 
for movers or groups with 

second homes, 216-217 
need for, 15 
and needed research on 
undercount and overcount, 
25, 204-205 
priorities for research and 

testing in, 221 
proposals for, 10-13 
race and ethnicity 

information in, 205-214 
recommendations for 1990, 

202-204, 220-223 
and resident observer 

studies, 236-237 
sampling in, 255-257 

recommendations for 1990, 

256 
Imputation 

and adjustments, 294, 

305-306 

multiple, 116-117 
sequential hot-deck, 22, 

95, 115-117 
Inclusion probabilities for 

two lists, 343-346 
Income 

and fund allocations, 68, 70 
and gross omission rates, 

230-231, 234 
reporting errors, 264 
Independence, assumption of, 

140, 162-163, 170, 343-346 
See also Correlation bias; 

Dual-system estimation 
Independent estimates for 

population groups, 134-135 
Independent observers, use 

of, 35 
See also Systematic 

observation 

Independent surveys, 326, 343 

matched to census data, 123 

Indian reservations, coverage 

of, 106-107 

Infant Enumeration Study, 136 
Internal Revenue Service 

records, 42 

matching with census of 
1980, 183-184, 229-231 



Iterative proportional 

fitting, 22, 95, 118-119, 
304-305 

and consistency of data, 294 
See also Adjustment of 
census figures 



Jersey City, census pretest 
in, 105, 110 



parameter; see 
Correlation bias; 
Dual-system estimation; 
Independence, assumption 
of 



Legal aspects 

of data for reapportionment 
and redistricting, 32, 43 
of deadlines for census 

data, 296 

of mid-decade census, 59 
of race and ethnicity 

questions, 205 
of sampling, 246 
List samples, in post- 
enumeration surveys, 125, 
127 
Local agency uses of census 

data, 79-80 
Local area data and 

estimates; see Small area 
data and estimates 
Local lists, use of, 124 
Local review of preliminary 

data for 1980, 198 
Long-f orm/shor t-f orm 

questionnaires, 258 
alternatives to, 261-262 
experience with short form, 

24 

related data in, 95-96, 
118-119 



396 



Government planning uses of 
characteristics data, 
46-47 
Government uses for census 

data, 7-8 

by federal government, 60-64 
by local governments, 79 
by states, 76-77, 78 
Grant programs, data needed 

for, 43 

See also Fund allocations 
Great Britain, census 

procedures in, 100-101 
Gross omissions; see 

Undercount 

Gross overenumerations; see 
Overcount 



H 



Hard-to-count groups, 177-187 
experience in 1980, 177-184 
experience from previous 

censuses, 184-187 
focus groups in, 211-212, 

223 
and procedure to improve 

coverage, 217-220 
proposals for, 24-25 
research on, 204-205, 

323-324 

Health Interview Survey, 42 
Hierarchical Bayesian method 
in adjustments, 302-303, 
315-317 

Hispanics; see Race, coverage 
errors in and estimates 
of; Race and ethnicity 
questions 
Historical aspects of 

censuses, 1-2 
in centennial census of 

1890, 2, 3 

and criticisms of data, 6 
in first census of 1790, 2 
review from 1950 to 1980, 22 
and uses of data, 39-40 
Hot-deck imputation, 

sequential; see Imputation 
Household coverage, 186-187, 
188-189 



in census of 1970, 1< 

194, 214 

in census of 1980, 2( 
and children living 

elsewhere, 217-219 
gross omission rates 

240-242 
and mail delivery pr< 

in multiunit struci 

107 

in program for 1990, 
resident observer sti 

in, 186-187, 236-2! 

350-351 

Household relationship! 
and gross omissions, 
and gross overenumer< 

238 
Housing, fund allocati< 

for, 62 

Housing structure data 
and data from 

administrative reo 

265-267 
recommendations for 

improvement in, 28 

273-274 



Immigrants 

illegal estimates of 

150-151, 163, 180, 

332, 355-359 
legal, records of, 1 
Improvement programs , 

107, 176-242 
for accuracy of coun 
in census of 1970, 1 

costs and effectiv 

of, 192-194, 198-2 
in census of 1980, 9 

195-202 

costs and effectiv 

of, 200-202 
for children not par 

household, 217-219 
costs of, 192, 200, 
for hard-to-count gr 

177-187 



399 



dual system; see Dual- 
system estimation 

hazards of, 170 

multilists in, 169-170 
Multiplicity surveys, 124, 
217-220 



N 



National Center for Health 

Statistics, 213, 214 
National Content Test 

of 1976, 211 
planned for 1986, 112 
National Research Council 
Advisory Committee on 
Problems of Census 
Enumeration, 13-14 
National Research Council 

Panel on Decennial Census 
Plans, 14 
National Vacancy Check of 

1970, 121, 133, 189, 256, 
291 

effectiveness of, 193 
See also Vacant/Delete 

check 
Native-American enumeration 

techniques, 106-107 
Net coverage errors; see 

Overcount; Under count 
Netherlands, census 

procedures in, 102 
Network coverage improvement 

questions, 124, 217-220 
New Jersey 

census data use in, 81-86 
fund allocations in, 68 
Jersey City census pretest, 

105, 110 
New York City 

administrative records used 

in 1980, 200 
protest of 1980 census, 

124, 154-157, 169 
Nonhousehold Sources 
Program; see 
Administrative records 
Nonresponses 

reduction in, 338-339 
and unresolved matches, 
160-161 



Oakland pretest in 1977, 124, 

217-218 
Omissions of data; see 

Undercount 
One-form census with 

follow-on survey, 261-262 
Overcount, 238-240 

balancing with undercount, 

163, 341-343 
in census of 1980, 9 
findings from earlier 

censuses, 239-240 
findings from Post- 
Enumeration Program of 
1980, 183, 238-239 
research needed on, 204-205 
sources of, 143 



P Sample; see Evaluation 

programs 

Participant observer studies; 
see Resident observer 
studies; Systematic 
observation 
Planning cycle for 1990 

census, 4-5, 111 
Postal Service checking of 

address lists 
in census of 1970, 121, 
133, 189, 256 
effectiveness of, 193 
in census of 1980, 198 
Postcensal estimates, errors 

affecting, 70-74 
Post-Enumeration Post Office 

Check 
in 1970, 99, 121, 133, 189, 

193-194, 256 

in 1980, 94, 198-199, 203 
Post-Enumeration Program in 
census of 1980, 97, 123, 
139-148 

blocking in, 173 
cost of, 158 

E sample in, 123, 142-143, 
337 



398 



Loss functions, and 

adjustment procedures, 
278-288, 313-314 
relative squared error in, 

283 

squared relative error in, 
283 



M 



Macro-level coverage 

evaluation, 22, 122 
Mail delivery problems, in 

multiunit structures, 107 
Mail nonreturn rates, 182, 

228-229, 238-239, 251, 271 
Mailing lists, use of, 89 
Mailout-mailback enumeration, 

2-3 
coverage errors in, 183, 

227, 242 

use in 1970, 99, 189-192 

use in 1980, 91-193, 198 

Maryland, fund allocations 

in, 68 
Matching, 162 

of administrative records 
to census data, 22, 123 
automated procedures in, 348 
blocking used in, 172-173 
of Current Population 

Survey to census records; 
see Current Population 
Survey 

fractional, 340 
general algorithm of, 172 
of independent surveys to 

census data, 123 
mathematical model for, 

174-175 
operational difficulties 

of, 171-172 
three-way match with 

Census/CPS/IRS, 329-330 
variable selection in, 173 
variance estimation in, 

348-349 
Medicare data, 150 

for demographic estimates 
of elderly population, 
134, 137-139, 149-150, 331 



for estimates of el 

population, 134, 
matched to 1970 cen 

records, 133 
Megalist method, 124, 
Merging of lists; see 
Composite list fo 
Methodology 

in Australia, 100 
balance of procedun 
in Canada, 100 
in census of 1950, < 
in census of I960, ! 
in census of 1970, < 
in census of 1980, { 
in coverage evaluati 

122-124 
in coverage improven 

176-242 

in Denmark, 102-103 
in France, 101 
in Germany, 101 
in Great Britain, 10 
independent reviews < 

13-17 

in Netherlands, 102 
and plans for 1990, ; 

104-109 
and review of pretest 

plans, 109-114 
in Sweden, 102 
Micro-level coverage 

evaluation, 22, 122 
Mid-decade census 

advantages of, 73, 74 
legal aspects of, 59 
need for, 16, 20, 51 
Missed persons campaign 

1970, 192, 194 
See also Casual Count 
Movers 

checked in 1970, 192 

effectiveness of, 1< 
proposed coverage of, 

216-217 

Multilist methods, 169-1 
Multiple list estimation 
methods, 152-154, 16 
composite lists in, 

123-124, 152-156, 161 
covariate information 
170 



401 



Race, coverage errors in 

and estimates of/ 9, 53-54, 

57-58, 67, 126, 135-138, 

151, 177-187, 226-231, 

234-238 
Race and ethnicity questions, 

205-214 
in census of 1980, 206, 

207-208 
comparability of data from, 

212-214 
considerations for 1990, 

209-210 

development of, 23-24 
in earlier censuses, 206 
and focus group methods, 

211-212, 223 
and mail nonreturn rates, 

182 
and overenumeration rates, 

183, 238 
prescribed categories in, 

207 

problems with, 332 
and undercount rates, 

178-179 

Raking ratio estimation, 119 
See also Iterative 

proportional fitting 
Reapportionment 

deadlines for data in, 32, 

43 
errors affecting, 10, 45, 

54-56 
loss functions affecting, 

281, 285-286 
and need for census data, 

7, 10 
Record checks 

administrative records in; 

see Administrative records 
reverse; see Reverse 

record checks 
Record linkage for automated 

matching, 105, 174-175 
Recreation, fund allocations 

for, 63 
Redistricting 

deadlines for data in, 32, 

43 



errors affecting, 56-59 
and state uses of census 

data, 75-76 
Refusals for interviews, rate 

of, 338 
Regions, undercounts in, 126, 

181, 227, 236, 241-242 
Regression techniques in 

adjustments, 302, 303, 

306-307, 311 
Representation in Congress , 

and need for census data; 

see Reapportionment 
Research 

on adjustments, 19, 307-312 
on coverage evaluation, 19 
on coverage improvement 

programs, 221 
on hard-to-count groups , 

323-324 

on impact of errors, 51-70 
on loss functions and 

adjustment, 285-288 
plans for 1990, 104-109 
priorities for 1986, 21, 

110-113, 220-223 
on responses to content 

items, 264-265 
and review of pretest 

plans, 109-114 
on sampling, 253-254 
testing schedule in, 4-5, 

111 
on undercount and 

overcount, 204-205 
Residence rules, 105, 216 
Resident observer studies, 

124, 186-187, 236-237, 

350-351 
See also Systematic 

observation 
Revenue sharing, fund 

allocations for, 61, 67-69 
Reverse record checks, 34, 123 
in Canada, 123, 131-132, 

182, 299, 319 

in census of 1960, 129-130 
plans for 1990 census, 35, 

333-335 

problems with, 327, 328 
Reviews of census plans by 

independent sources, 13-17 



400 



gross omissions in, 

180-182, 224-229 
gross overenumerations in, 

183, 238-239 
matching with Current 
Population Survey, 123, 
141-147, 224, 337 
P sample in, 123, 142, 337 
possible improvements in, 

338-349 

problem areas in, 299 
procedures in, 336-338 
and state population 

estimates, 55-56 
Post-Enumeration Program 

planned for 1990, 320 
Post-enumeration surveys, 

33-35, 123 
in census of 1950, 125-127, 

186, 232-236 
in census of 1960, 127-128, 

186, 232-236 

pretest in 1985, 322-323 
problems with, 33, 328 
See also Area samples, in 
post-enumeration surveys; 
Compact area cluster 
samples, in post- 
enumeration surveys; List 
samples, in post- 
enumeration surveys; 
Pre-enumeration surveys 
Precanvass 

in census of 1970, 189, 

192-193, 198 

in census of 1980, 90, 198 
in test program for 1990, 

203 
Pre-enumeration surveys, 33, 

346-347 
Preferred estimates of 

undercounts , 137, 138 
Prelist 

enumeration areas in 1970, 

189, 198 
enumeration areas in 1980, 

89, 94, 198-199 
recanvass in 1980, 94, 
198, 199, 203 



Pretests 

in Oakland in 1977 , 

217-218 
plans for 1985, 10! 

109-110, 322 
plans for 1986, 21, 

106-108, 110-113 

216-217, 220, 24' 

256, 260, 307-301 
plans for 1990, 4, 

321-330 
in Tampa in 1985, '. 

212, 322-324 
Public assistance, f 

allocations for, 
Public cooperation w 

census, 188 
pre-enumeration su 

affecting, 347 
Public planning uses 
of basic counts fo 

areas, 44 
of census data in, 
Purposes or uses of 

7-8, 20, 37-86 
and basic counts f 

areas, 41-46 
and changes in tin 

periods, 50-51 
characteristics da 

46-49 
distinguishing fee 

40-51 

by federal governr 
historical uses ir 
and impact of errc 
by local agencies , 
reassessment of, ' 
by states, 75-78 



Questionnaire desigi 
content, 48-49, 
205-220, 257-26 
See also Content 
Short-form/long 
questionnaires 



Supplemental forms operation 

in 1970, 192 
effectiveness of, 194 
Sweden, census procedures in, 

102 
Synthetic estimation, and 

adjustments, 119, 

303-304, 318 
Systematic observation, 

351-354 
variance and cost estimates 

for, 360 



Tampa pretest in 1985, 105, 

110, 212, 322 
Tape Address Register 
Enumeration Areas 
in 1970, 189, 192, 198 
in 1980, 89-91, 198 
Telephone follow-up 

procedures, 26, 27, 
254-255 
Testing programs; see 

Pretests 
Time-series character of 

census, 50-51 
Timing considerations 
in adjustments, 31-32, 

296-298, 346 
in automation, 347-348 
in coverage evaluation, 

157-158 
in pre-enumeration surveys, 

346-347 

in sampling, 245 
Tracing methods, in Forward 

Trace Study, 161, 325, 335 
Transportation, fund 

allocations for, 64 
Triple-system estimation, 

344-345 
Two-stage enumeration 

procedure 
testing of, in 1985, 105, 

110 
use in 1960 census, 98 



U 



Undercount 

balance with overcount, 

163, 341-343 
in census of 1980, 9, 

180-182, 224-231 
and coverage evaluation , 

120-175; see also 

Evaluation programs 
findings from previous 

censuses, 231-237 
and fund allocations, 68-69 
gross omissions in, 224-237 
in housing coverage 

studies, 240-242 
and reapportionment, 54-56 
research needed on, 204-205 
sources of, 120, 144 
and values assigned to 

missing responses, 22 
Update list/leave enumeration 

procedure, 91-92, 107, 

222-223 
Urban areas 

housing coverage studies 

in, 241 
undercounts in, 126, 

227-229, 236 
Uses for census data; see 

Purposes or uses of census 



Vacant/Delete check 

in census of 1980, 198-199, 
256, 257 
effectiveness of, 

201-202, 203, 222 
and recommended use of 

sampling, 257 
See also National Vacancy 

Check of 1970 
Variance estimation, 291 

in matching process, 348-349 
Vital statistics records, 43 
birth registration, 136, 
137, 138, 149, 331 



402 



Rural areas 

enumeration procedures, 107 
gross omissions in, 236 
housing coverage studies 
in, 241 



Sampling procedures, 10, 15, 

18, 243-262 
for census enumeration, 

243-246, 247-250 
for content items, 257-262 
costs of, 245, 248, 268-269 
coverage deficiency in, 249 
for coverage improvement, 

255-257 

recommendations for 1990, 

256 

errors in, 245 
feasibility of, 245 
in follow-up studies, 26, 

250-255, 270-272, 341 
recommendations for, 26-28 
and reduction of respondent 

burden, 246 

research on, 104, 253-254 
rolling samples in, 247, 249 
and time required for field 

work, 245 

for unresolved cases, 341 
See also Post-enumeration 

surveys 
Second-home groups, proposed 

coverage of, 216-217 
Sequential hot-deck 

imputation, 22, 95, 

115-117 
Sex category, undercounts in, 

178-179 
Shor t-f orm/long-f or m 

questionnaires, 258 
alternatives to, 261-262 
experience with short form, 

24 
related data in, 95-96, 

118-119 

Small area data and estimates 
adjustment in, 276, 279-282, 

288-289, 293-295, 299, 

303-307 



administrative record 

42 

basic counts for, 41- 
characteristics data 

46-49 
comparison and rank in 

44 

sampling affecting, 4 
See also County data 

estimates; State da 

estimates 
Social Security 

administration reco 
Social services, fund 

allocations for, 63 
Socioeconomic research i 

characteristics dat 

47-48 
Standard metropolitan 

statistical areas , 

undercounts in; see 

areas 
State government uses o 

census data, 75-78 
and case study of dat, 

in New Jersey, 81-8' 
for classification of 

governments, 76 
for equal employment 

opportunity program! 
for fund allocations, 
for implementation of 

federal programs, 7: 

for planning purposes 

for redistricting, 75- 

State data and estimate! 

10, 55-56, 71, 151, 

285-286, 296-297, 3( 

321, 331 
See also Adjustment o: 

census figures; Sma! 

area data and estim< 
Stratification in dual-! 

estimation, 141, 34 - 
Subcounty population 

estimates for 1980, 

errors in, 72, 73, 
Subgroup characteristic! 

data, 46-49 
Subject items; see Cont 

items 



death records, 149-150, 164 fitting in, 
race and ethnicity Were You Counted 

definitions in, 213 200, 222 

See also Suppl 
operation in 

W Whole Household 

Elsewhere Pr 

Weighting 201-203, 217 

and adjustments, 294 Wolter paper on 

of long-form records, evaluation a 

iterative proportional 308, 310, 32 



